Polymorphic Code:What it is, how it works, and how it is used

Add to your RSS feed
7 min read

What Polymorphic Code is, how it works, and real-world use cases like BusyBox.

Polymorphic Code:What it is, how it works, and how it is used

Table of Contents

  1. Introduction
  2. What is polymorphic code
  3. How Polymorphic Codes are used and BusyBox

Introduction

Last night, just before going to sleep, I was scrolling through YouTube when I came across a recommended video from Carles Cabergs, where the title of the video was Polymorphic Executables (recommended), which got my attention, since I hadn’t heard about it before, only something related with Alpine-based Docker images (we will see it later), but nothing else.

So, what is exactly a “Polymorphic Code”?

What is Polymorphic Code

If we want to define quickly what is a Polymorphic Code, with a simple search you will find something like

Code that uses a polymorphic engine to mutate while keeping the original algorithm intact.

But what does that actually mean? To understand it better, let’s first recap how arguments are used in programs. For example, consider a file named main with the following code:

#! /bin/bash
echo "first argument: $0"
echo "second argument: $1"
echo "third argument: $2"

Now, if we execute the program, we will get different outputs based on the arguments that we specified in the execution. As an example

root@covicale:~$ ./main covicale.com
 
first argument: ./main
second argument: covicale.com
third argument:
root@covicale:~$ ./main hello world
 
first argument: ./main
second argument: hello
third argument: world

One thing we notice, is that $0 will contain the value of the first argument, which is always the script that has been called

Lets take a look at the next code:

#!/bin/bash
 
case "$(basename "$0")" in
  "main")
    echo "main file called"
    ;;
  "polymorphic")
    echo "wow this is polymorphic"
    ;;
  *)
    echo "unknown executable name: $(basename "$0")"
    ;;
esac

If we try to execute it normally through the main file, we already know what is going to happen:

./main
main file called

Now, if we wanted to reach the polymorphic case in our switch statement, we would think that we could do it only changing the name of the file. If that were the case, this whole article would not exist and you would not be reading this.

So, how could we reach other cases without changing the name of the file? The secret resides on symbolic links

As I said before:

$0 will contain the value of the first argument, which is always the script that has been called

This means $0 will contain the script that has been called, not the name of the file itself.

Right now, we only have a main file, which is the entrypoint for executing the code, but what will happen if we create a symbolic link to that file with a different name? Lets try it.

To create the symbolic link, we will use the ln command, which create a link to a file and the -s option, specifying that we want a symbolic link rather than a hard link. As an example, we will create two symbolic links: polymorphic and whatisgoingon.

ln -s main polymorphic && ln -s main whatisgoingon

Once we did this, if we execute ls -l, we can check the symbolic links we just created

ls -l
total 4
-rwxr-xr-x 1 root root 230 Mar 11 18:26 main
lrwxrwxrwx 1 root root   4 Mar 11 18:40 polymorphic -> main
lrwxrwxrwx 1 root root   4 Mar 11 18:40 whatisgoingon -> main

Now, if we execute the files, they belong to the same main file, but since the first argument is the symbolic link, it will reach other parts of the code:

root@covicale:~$ ./main
main file called
 
root@covicale:~$ ./polymorphic
wow this is polymorphic
 
root@covicale:~$ ./whatisgoingon
whatisgoingon action is not specified :/

How Polymorphic Codes are used and BusyBox

The main reason for using symbolic links in this way is optimization, saving disk space, where instead of having multiple copies of the same binary for different commands, a single executable can be reused under different names, with symbolic links pointing to it. Another important point is improved maintenance and updates, since if we apply a new patch or any improvement to the code, it will be applied to all linked commands, instead of needing to apply it one by one.

BusyBox is probably the most famous, one of the oldest (more than 25 years) and one of the most used piece of sotware that uses the concept of Polymorphic Code to improve the size of their binaries, since it was specifically created at first instance for embedded operating systems with very limited resources.

Maybe you think you have never used BusyBox and I thought the same, however, one thing that you used almost one hundred percent sure if you developed something with docker images, are the Alpine Docker-Based Images. Almost all bigest and most used docker images, has their own alpine version (golang example: 1.24.1-alpine3.21). This Alpine Images are based, of course, on Alpine Linux which is a super lightweight linux distribution and guess what, uses BusyBox.

If you want to check it by yourself that this is true, you can test it very quickly running a container based on the alpine image, move to the \bin directory, and executing ls -l. Once this is done, you can check how all the commands that exists, are just symlinks to the busybox program.

/bin # ls -l
total 792
lrwxrwxrwx    1 root     root            12 Feb 13 23:04 arch -> /bin/busybox
lrwxrwxrwx    1 root     root            12 Feb 13 23:04 ash -> /bin/busybox
lrwxrwxrwx    1 root     root            12 Feb 13 23:04 base64 -> /bin/busybox
lrwxrwxrwx    1 root     root            12 Feb 13 23:04 bbconfig -> /bin/busybox
-rwxr-xr-x    1 root     root        808712 Jan 17 18:12 busybox
lrwxrwxrwx    1 root     root            12 Feb 13 23:04 cat -> /bin/busybox
lrwxrwxrwx    1 root     root            12 Feb 13 23:04 chattr -> /bin/busybox
lrwxrwxrwx    1 root     root            12 Feb 13 23:04 chgrp -> /bin/busybox
lrwxrwxrwx    1 root     root            12 Feb 13 23:04 chmod -> /bin/busybox
lrwxrwxrwx    1 root     root            12 Feb 13 23:04 chown -> /bin/busybox
lrwxrwxrwx    1 root     root            12 Feb 13 23:04 cp -> /bin/busybox
...

I thought it was quite interesting how a feature like symbolic links, which at least for me, I only used before when I wanted to have a quick access to something through the desktop, can be used for other purposes like this and how almost all of us, used this feature before without even know it :)

Add to your RSS feed