Home Insights Data Science & AI Game asset creation with generative AI tools: A tutorial
Game asset creation with Generative AI tools: A tutorial

Game asset creation with generative AI tools: A tutorial

In game development, the costs associated with creating art assets are often estimated to be about 30-40% of the total game budget. This makes game asset creation one of the most important applications for generative AI tools and services. In this blog post, we explore the capabilities of Midjourney and Scenario, two public text-to-image generative AI services for art asset creation, in the context of game development applications. We leverage the methodology outlined in this thread on how to precisely design a small building in a game, but perform more systematic analysis of how individual asset properties can be controlled, and also develop several new techniques that simplify the workflow and increase the expressiveness.

Generating the initial designs

For this use case, we assume a real-time strategy game set in the nearest future, and explore the generation of various isometric building sprites for it. We start with generating the initial designs in Midjourney using just the text prompt, and then research how these initial designs can be modified.

Building types

Midjourney V4 does an excellent job generating isometric building designs based on basic prompts that contain only the building type and several style instructions:

isometric oil refinery, cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background

The designs generated based on the basic prompts have relatively high variability in style even if the chaos parameter is to the minimum. To create several buildings of approximately the same style, we usually need to generate dozens of designs and manually pick the matching variants.

isometric large nuclear plant, cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background –chaos 1

This basic approach works well for more complex prompts that combine several instructions about the building structure or use relatively uncommon or ambiguous concepts and terms:

isometric tower with a radar dome on the top, cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background
isometric soviet military bunker, cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background

Wear and tear

The examples below demonstrate how the “default” designs can be customized with various types of damage and degradation. The variability of the generated designs decreases as we add more specific instructions about the style and scene layout, so it becomes easier to create a collection of buildings with a certain consistent style.

isometric factory, rust leaks, dirty old paint, moss, cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background
isometric old ruined concrete factory, abandoned, dust and rust, tons of metal rusty scrap scattered around, post-apocalyptic cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background
isometric ruins of a completely destroyed factory building, many large concrete fragments and tons of metal rusty scrap scattered around, abandoned, post-apocalyptic cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background

Weather and light

Similar to wear and tear, we can control the weather and lighting conditions. The instructions that specify the season, weather, and light also provide a powerful way to decrease the style variability and generate consistent collections of buildings.

isometric square lot with a concrete bunker in the center with a large radar dish on the top, dramatic sunset lighting, long deep shadows, concrete debris and steel boxes, abandoned, post-apocalyptic cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background
isometric square lot with a concrete bunker in the center with a large radar dish on the top, heavy rain and dense fog, concrete debris and metal boxes, abandoned, post-apocalyptic cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background
isometric square lot with a concrete bunker in the center with a large radar dish on the top, dark night, glowing lights, deep freeze and deep snow, concrete debris and metal boxes, abandoned, post-apocalyptic cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background

Colors

Controlling the minor details, and specifically, colors, can be a challenging problem. Consider the following baseline designs:

isometric perfectly white office building, winter, deep white snow, sunrise, cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background

Extending this baseline prompt by requesting minor colored elements results in major changes of the overall color theme:

isometric perfectly white office building with a red flag on the top, winter, deep white snow, sunrise, cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background

These correlations are generally difficult to suspend using negative prompts (as illustrated in the examples below), and other prompt engineering techniques. However, these issues can be addressed by combining manual image editing and reference-based generation, as we discuss later in this blog post.

isometric perfectly white office building with a large red flag on the top, winter, deep white snow, sunrise, cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background :: red::-.2

Quantitative details

Controlling the quantitative parameters of the building and scene is also a challenging task, especially for complex prompts with multiple instructions. In many cases, the desirable results can be achieved but with a low success rate (only a small percentage of the generated designs meet the specification):

isometric square lot with two small concrete bunkers in the center, each bunker has a large radar dish on the top, concrete debris and metal boxes, abandoned, post-apocalyptic cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background
isometric office building with three windows, concrete debris and metal boxes, abandoned, post-apocalyptic cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background
isometric office building with twelve windows, concrete debris and metal boxes, abandoned, post-apocalyptic cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background

Mixing styles and themes

We conclude the overview of the initial design generation capabilities with a few examples of mixing multiple styles. We can start with engineering a prompt for an alternative style such as Warcraft-like fantasy:

isometric lumber mill in a wood, large sawmill blade, orcs and humans, fantasy style, realistic, high fidelity, video game, style of behance, made in blender 3D

This new style, and the previously used cyberpunk style, can be  combined in a single prompt:

isometric large tower with a large radar antenna on the top, orcs and humans, fantasy style, post-apocalyptic cyberpunk style, realistic, high fidelity, video game, style of behance, made in blender 3D

The contribution of different styles can also be controlled using prompt weights. For example, we can make the fantasy style two times more important than the cyberpunk style, as depicted below:

isometric large tower with a large radar antenna on the top, orcs and humans, fantasy style, realistic, high fidelity, video game, style of behance, made in blender 3D ::2 isometric large tower with a large radar antenna on the top, post-apocalyptic cyberpunk style, realistic, high fidelity, video game, style of behance, made in blender 3D

Creating variations and scaling up the process

Midjourney provides extremely impressive capabilities for generating the initial designs based on structural and style instructions. However, it can be challenging to create collections of buildings with the same style, as well as adjust minor details, using only the textual prompts. In this section, we explore several techniques that help to address these limitations.

Midjourney provides the built-in Variations feature that can be used to generate alternative designs based on a specific initial design. However, this feature does not allow you to set the chaos parameter, and can be used only to generate small variations:

The alternative approach is to use the Image Reference feature that provides much higher variability by default, and also allows setting the chaos parameter explicitly:

https://s.mj.run/ZNA2WX0gXcQ isometric square lot with a small concrete bunker with a large radar dish on the top, concrete debris and metal boxes, abandoned, post-apocalyptic cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background –c 10

The color correlation problem can be alleviated by replacing the problematic instructions in the prompt with a manually modified reference image. This produces reasonably good results for many applications:

https://s.mj.run/CasXo0nIkyk isometric perfectly white office building with a large flag on the top, winter, deep white snow, sunrise, cyberpunk style, realistic, video game, style of behance, made in blender 3D, white background

Creating variations using fine-tuning

All methods described above rely on manipulations of the conditioning signal in the stable diffusion model that backs the Midjourney services. The level of control over the composition, details, and variability that can be achieved using this approach is somewhat limited. The alternative option is to fine-tune the diffusion models based on manually selected, edited, or drawn reference images.  This approach was developed in the DreamBooth paper, and productized in the Scenario service.

The fine-tuning approach can be illustrated with the following example. We start by generating multiple initial designs, and manually select a small set (usually 10-20) images of the same style. This allows us to accurately control both the style and composition/structure variability:

The model fine-tuned on such a training set can be used to sample style-consistent designs based on short prompts such as the following:

radar dome
tall factory building

AI game asset generation: Conclusions

The general-purpose, pre-trained text-to-image and image-to-image generative models provide very impressive capabilities for game asset generation. Services like Midjourney and Scenario make these models very accessible and enable extremely productive asset development workflows. The techniques described in this blog post help to improve the control over the generation process and address some of the typical needs such as the generation of style-consistent collections of assets. We anticipate that the capabilities of the generative AI services, as well as applied no-code techniques for using them, will rapidly evolve in the next few years, revolutionizing the design and game development industries.


Get in touch

We'd love to hear from you. Please provide us with your preferred contact method so we can be sure to reach you.

    Game asset creation with generative AI tools: A tutorial

    Thank you for getting in touch with Grid Dynamics!

    Your inquiry will be directed to the appropriate team and we will get back to you as soon as possible.

    check

    Something went wrong...

    There are possible difficulties with connection or other issues.
    Please try again after some time.

    Retry