Playing Flappy Bird With AI

7 min readJan 8, 2020

Flappy Bird, an iconic yet remarkably simple game that provides a lot of frustration.

A player simply taps the screen to fly a pixelated bird up and down to pass between a series of randomly generated pipes. Yet it still proves to be remarkably hard, at least for me…

I got frustrated at the game pretty quickly. So I figured if I can’t play the game…

I’ll code something to play it.

So how did I do it?

I choose to create a convolutional deep q-neural network and train it with reinforcement learning to play the game as long as I can. (Deep Q-networks was actually what Google’s Deep Mind used to play Atari breakout.) I used Pygame to generate the actual Flappy Bird game and based my neural networks in Jupyter Notebooks.

The Game Environment 🎮

I choose to code the Flappy Bird environment using Pygame, a python module designed for creating games.

I found many Pygame versions of Flappy Bird on Github and ended up choosing this one.

I used the sprites and other visuals in my project, along with some of the code for the game which I tweaked to fit my needs.

I implemented rewards for the neural network when the bird survived for a bit longer (0.1) and when it passed a pipe (1), I also made a penalty (-1) for when the bird dies. All of this I’ll address later on when discussing my implementation of reinforcement learning.

Now let’s go onto the core of the challenge…

The Core Network 💻

How does Reinforcement learning work? 📚

Reinforcement learning is all about training an agent to better interact with its environment. Agents perform actions in the environment which yield different scenarios called states.

The agent receives rewards which are the main feedback source of the network. These rewards are positive, negative, or even zero depending on the state.

An agent’s goal is to maximize the total reward collected in one episode, which are all the states from the beginning initial state to the end terminal state. From this feedback loop, the agent is able to develop a working policy.

To sum it all up…

Agent ~ The protagonist that interacts with the game environment
Environment ~ A fancier word for the game world
Actions ~ Things that the agent does in the game
States ~ Scenarios in an environment
Rewards ~ Tells the algorithm how well the agent is doing
Episode ~ Everything between the initial and terminal states
Policy ~ The strategy developed by the agent to win

The thing with Reinforcement learning is how the Agent needs to learn to take special actions to collect more rewards. Just like how the Flappy Bird needs to go up in the proper timing to pass through the pipe. The Agent has to learn how to indirectly link the reward with the action, and learn to postpone rewards.

If you haven’t read my article on types of machine learning, read it here to get up to speed!

What are convolutional networks? 🦾

Convolutional neural networks are a type of deep learning network that focuses on image recognition and computer vision. CNNs are a huge topic and to go in-depth would be an article in and of itself (which I will write further down the road).

CNNs are constructed of four main layers:

Convolutional Layer — this layer is responsible for identifying the features in an image

Imagine a flashlight (aka filter) sweeping the input image, each section that a flashlight shines on is a receptive field. The filter also contains an array of weights that are multiplied with the pixel values in the receptive field. The calculations are summed to form a feature map

Rectified Linear Unit Layer — The ReLU layer strips of portions of the image for better feature extraction

The ReLU applies the activation function of

f(x) = max(0, x)

What this means is that all negative activations are changed to zero.

relu(x) = x if x ≥ 0

otherwise 0 if x < 0

Pooling layer — created to reduce the parameter of the input (concentrates the input information)

This is a downsampling layer, where a filter is applied to the input volume returning the highest number in every section the filter goes through. For example, if there were the values of 0, 1, 3, 6 in a subregion the pooling layer would simplify that subregion into just the highest value 6.

Connected layer — final standard feed-forward neural network

The fully connected layer takes the input from the pooling layer before it and outputs a vector with probabilities for each of the classes.

Training the Network to Play Flappy Bird 🎓

What is the Epsilon-greedy policy? 🎲

There are two important factors to consider in our reinforcement learning algorithm: exploration (the agent explores possibilities), and exploitation (where the agent exploits known rewards).

The tradeoff of exploration versus exploitation is crucial. What percentage of time should our algorithm attempt new directions, compared to gaining known rewards?

The strategy is the epsilon-greedy policy, where the algorithm starts with a higher level of exploration but as it matures focuses more on exploitation thus reducing the amount of exploration. The level of exploration is represented by the epsilon which indicates the percentage of the time that the agent randomly selects an action.

# Epsilon-greedy policy
# Under each probability (epsilon) we either take a random action
# or we select an action with the highest value# more exploration in start and less later on (epsilon value)def epsilon_g(est, epsilon, num_action): 
    def policy(state):       #If a random float number is less than the epsilon, 
       #return a  random number corresponding to an action
        if random.random() < epsilon: 
            return random.randint(0, num_action - 1) # Exploration       #Otherwise, send the state into the algorithm
       #return the best known action based on predictions
        else: 
            q_values = est.predict(state)
            return torch.argmax(q_values).item() # Exploitation
    return policy

Setting up the Input States ⌨️

Along with setting up the epsilon-greedy policy, there are other small preparations such as generating the random seed for the algorithm, setting up the game environment.

After which we need to set up the process for resizing the images, pre-processing them, and stitching them together to feed into the CNN as input (training data).

Note: Images are sent into the CNN in grayscale format.

# State made from stiching 4 images
# One image is not enough info, use four adjacent stepspic = torch.from_numpy(pic)
state = torch.cat(tuple(pic for _ in range(4)))[None, :, :, :] 
#When I only have the first frame, I'll just repeat the first one 4x

The Training of the Model 💡

In summary, during training the first image is pulled from the environment, it is processed, then stitched to form the state(input). The input is sent into the CNN where the next action is decided (epsilon-greedy policy), losses are calculated and used to update the model. I repeated this process over multiple epochs (iterations), in my case 2000000 times.

Testing the Network and Playing Flappy Bird 🏆

After training my network for over 18 hours (with many computer crashes), I pulled out my final saved model (whom I named Cinna) and played a few episodes to see Cinna’s performance. I was finally able to achieve an average score of 250 in Flappy Bird with my algorithm!

Other Applications ⚙️

CNNs aren’t just useful for reinforcement training and playing games, they also have lots of real-world applications as well! Convolutional Neural Networks were designed as a image recognition and processing tool, meaning they were designed to process image pixel data and map out complex non-linear connections.

They’re main abilities are:

Group ~ sort alike objects and cluster them together
Recognize ~ understand the objects within an image
Classify ~ classify the visual of an object and sort it into a category

Here are a few examples of possible applications of CNNs:

Healthcare— Accurate image medical analysis
Advertising— Personalized advertisements based on data
Document Reader — Handwriting analysis, scanning someone’s writing
Facial Recognition— breaking down aspects of an individual’s face
Climate— understanding weather patterns better
Autonomous Vehicles— Object detection in self-driving vehicles

Although Reinforcement learning gets a lot of press and media attention from all the gaming they’ve done. Since they teach algorithms to determine ideal behaviors in particular situations, they maximize effectiveness. Leading to them being extremely useful in a variety of situations.

Here are a few examples of possible applications of Reinforcement Learning:

Traffic Light Control — Solve the problem of traffic congestions
Robotics — Train robot to learn actions from videos
Chemistry — Optimizing chemical reactions to show underlying mechanisms
Web System Configuration — configure parameters and fine-tuning the system
Advertisements — displaying the most relevant ads to viewers
Autonomous Vehicles — speeding up development cycles (Something I’m really interested in)

Key Takeaways 📌

Convolutional Neural Networks is a image recognition and processing tool, meaning they were designed to process image pixel data
Convolutional Neural networks allow for faster and more efficient analysis of images based on features
Reinforcement learning teach algorithms to determine ideal behaviors in particular situations using rewards as the guideline

I loved coding this project, and I hope you enjoyed following along! I encourage you to try it out and code your own projects with machine learning!

If you want to read more articles in the future, give my account a follow!

In the meantime, feel free to contact me at ariel.yc.liu@gmail.com or connect with me on LinkedIn.

You can also read my monthly newsletter here!

Or visit my personal website to see my full portfolio here!

Till next time! 👋