Home

We sought out to create a reinforced neural network to play through the original NES Mario, level 1-1. Our dataset was dynamically generated frames from a Python implementation of the original NES Mario as inputs.

The amount of frames our model got to train off of depended on the round. A single playthrough of the level played will take at least 29.33 seconds, as this was the world record through 1-1 achieved on 10/23/2025. Rounding up for simpler calculations, 30 seconds * 60 frames per second = 1,800 frame inputs for our dataset for that single attempt to beat the level. We feed in six frames at a time to give the model the ability to learn velocity, so the number of inputfeatures is 6 channels (6 grayscale images) * 60 rows * 80 columns = 28800. Through reinforcement learning our model seeks to find a policy that maximizes the reward function by repeatedly playing. The model is rewarded based on different criteria depending on how Mario performs in the level.

The problem of beating Mario boils down to a classification problem based on the current game state, using the game's rendered display to determine the next action for Mario (jump, left, or right). More exactly, multiclassification (multi-binary) as our action space allows the model to move horizontally and jump simultaneously (by the RL agent predicting up, right/left as classes at the same time).

Mario Demo

Frame –

How the model works

TLDR:

TLDR aside, lets get into some of the details!

Maybe break into more sections, model, training DDQN vs PPO, other ideas.

Citations