Tuesday, September 24, 2024
HomeGamingNew AI model “learns” to simulate Super Mario Bros. from video footage

New AI model “learns” to simulate Super Mario Bros. from video footage


At first glance, these AI-generated Super Mario Bros. The videos are pretty impressive. However, the more you watch them, the more flaws you'll find.

Last month, Google's GameNGen artificial intelligence model demonstrated that generalized image diffusion techniques can be used to generate an acceptable, playable version of CondemnNow, researchers are using some similar techniques with a model called MarioVGG to see if an AI model can generate a plausible video of Super Mario Bros. in response to user input.

The results from the MarioVGG model (available as a preprint published by cryptocurrency-linked AI firm Virtuals Protocol) still have many glaring flaws, and it’s currently too slow for anything approaching real-time gameplay. But the results show how even a limited model can infer impressive gameplay dynamics and physics just by studying a bit of video and input data.

The researchers hope this will represent a first step towards “producing and demonstrating a reliable and controllable video game generator,” or possibly even “completely replacing game development and game engines using video generation models” in the future.

Watching 737,000 frames of Mario

To train their model, the MarioVGG researchers (GitHub users erniechew and Brian Lim are listed as contributors) started with a public dataset of Super Mario Bros. A game containing 280 “levels” of input data and images organized for machine learning purposes (level 1-1 was removed from the training data so that images from it could be used for evaluation). The 737,000+ individual frames in that dataset were “preprocessed” into 35 frame chunks so that the model could begin to learn what the immediate results of various inputs generally looked like.

See also  PS5 Pro's $700 price tag is basically a bargain in Japan and Europe as Sony's exchange rates raise many eyebrows

To “simplify the game situation,” the researchers decided to focus on just two possible inputs from the dataset: “run right” and “run right and jump.” However, even this limited set of movements presented some difficulties for the machine learning system, as the preprocessor had to look back for a few frames before a jump to figure out if and when the “run” started. Any jumps that included mid-air adjustments (i.e., the “left” button) also had to be discarded because “this would introduce noise into the training dataset,” the researchers write.

After preprocessing (and about 48 hours of training on a single RTX 4090 graphics card), the researchers used a standard convolution and denoising process to generate new video frames from an initial static in-game image and some text input (either “running” or “jumping” in this limited case). While these generated sequences are only a few frames long, the last frame of a sequence can be used as the first frame of a new sequence, allowing for the creation of gameplay videos of any length that still show “coherent, consistent gameplay,” according to the researchers.

See also  Have you ever wanted to be in an ad? This airline will do it thanks to the power of artificial intelligence

Similar Articles

Comments

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular