Reinforcement Learning Mario

Home

We sought out to create a reinforced neural network to play through the original NES Mario, level 1-1. Our dataset was dynamically generated frames from a Python implementation of the original NES Mario as inputs.

The amount of frames our model got to train off of depended on the round. A single playthrough of the level played will take at least 29.33 seconds, as this was the world record through 1-1 achieved on 10/23/2025. Rounding up for simpler calculations, 30 seconds * 60 frames per second = 1,800 frame inputs for our dataset for that single attempt to beat the level. We feed in six frames at a time to give the model the ability to learn velocity, so the number of inputfeatures is 6 channels (6 grayscale images) * 60 rows * 80 columns = 28800. Through reinforcement learning our model seeks to find a policy that maximizes the reward function by repeatedly playing. The model is rewarded based on different criteria depending on how Mario performs in the level.

The problem of beating Mario boils down to a classification problem based on the current game state, using the game's rendered display to determine the next action for Mario (jump, left, or right). More exactly, multiclassification (multi-binary) as our action space allows the model to move horizontally and jump simultaneously (by the RL agent predicting up, right/left as classes at the same time).

Mario Demo

Frame –

Controls
Move Left	←
Move Right	→
Play/Pause	Space

Episode

Show action probabilities HUD

Loading…

Auto-loads /captures/replay_ep00004.json, _05.json, _06.json.

Note: going downward into a pipe is disabled (out of model scope).

How the model works

TLDR:

Inputs: multiple grayscale frames (frame stacking), downscale planned for speed.
Algorithms: DDQN baseline & PPO experimentation.
Rewards: movement-based (dx), coins/enemies, win/lose, time penalty; examples in docs.

TLDR aside, lets get into some of the details!

Maybe break into more sections, model, training DDQN vs PPO, other ideas.

Citations

F. Mourato, M. Santos, F. Birra, "Automatic level generation for platform videogames using genetic algorithms," in Proceedings of the 8th International Conference on Advances in Computer Entertainment Technology, 2011.
Mnih, V., et al. "Playing atari with deep reinforcement learning," in arXiv preprint arXiv:1312.5602, 2013
A. Seth, A. Nikou, M. Daoutis. "A scalable species-based genetic algorithm for reinforcement learning problems," in The Knowledge Engineering Review, vol. 37, pp. e9, 2022.
Towers, M., et al. "Gymnasium: A standard interface for reinforcement learning environments," in arXiv preprint arXiv:2407.17032, 2024.
Setiaji, B., et al, "Implementation of Reinforcement Learning in 2D Based Games Using Open AI Gym," in 2022 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), 2022, pp. 293-297.

Home

Mario Demo

Action probabilities

How the model works

Citations