my very first aarch64 assembly program

I have been thinking for some number of years of learning how to write ARM assembly language applications. The driver for this is an old book, “Threaded Interpretive Languages: Their Design and Implementation” by R. G. Loeliger. I purchased a copy back in 1981, the year the book was first published, at a local Book Stop in Atlanta. A threaded interpretive language, or TIL, is the general classification for Forth and other languages like it. I wasn’t yet married, was still something of a night owl, and so I had time on my hands so to speak to investigate how to implement a TIL. Over a period of three years I implemented the same personal TIL on the Z80 (the book was written to use a Z80), an MOS 6502, a Motorola 68000, and an Intel 8086. By the time I put the book aside I was doing a lot more “serious” work in C and Pascal on DEC minis and IBM PCs. The only time I went back to writing any assembly was later in the 1990s when I wrote a fair bit of code on the side for a 65c802 and an Intel 80c196. Both of these were custom designed embedded systems. Unfortunately I didn’t write a TIL for either one, as the requirements directed me to spend my time on other functionality.

So this weekend I decided to dig a bit into the ARM processor driving the Nvidia Xavier NX. I went around the net a bit looking for examples and tutorials, and finally managed to cobble together a “Hello World” program in aarch64/ARMv8 assembly. Here’s the tiny application I wrote.

#include "include.h"

	.global	_start

	mov	x8, __NR_write
	mov x2, hello_len
	adr x1, hello_txt
	svc	0

	mov	x8, __NR_exit
	svc	0


hello_txt:	.ascii "Hello, World!\n"
hello_len = . - hello_txt

I’m following, for the most part, this tutorial:, “A Guide to ARM64 / AArch64 Assembly on Linux with Shellcodes and Cryptography.” The code and instructions for how to assemble it are towards the middle. My only comment about the program is that line 9 in my listing is different from the original. I found that if I wanted to load the register with the address to the string to print, then I needed to explicitly code the mnemonic. For whatever reason the tools on the Xavier simply interpreted it as a mov instruction, and nothing would print.

Because of the number of steps involved in building the app, I wrote a bit of Python 3 to automate the process a bit. My Python code turned out to be longer than my assembly code.

#!/usr/bin/env python3

import argparse
import os
from pathlib import Path
import subprocess
import sys

if not sys.version_info.minor >= 6:
    print("You are using Python version {}.{}.{}".
    print("Python version 3.6.0 or higher is required.")

parser = argparse.ArgumentParser()
parser.add_argument("source", help="Assembly source file name is required.")
args = parser.parse_args()

if not os.path.isfile(args.source):
    print("File {} can't be found.".format(args.source))

filestem = Path(args.source).stem

preprocess = "cpp -E {} -o {}.as".format(args.source, filestem)
p =[preprocess], shell=True)
if p.returncode != 0:

assemble = "as {}.as -o {}.o".format(filestem, filestem)
p =[assemble], shell=True)
if p.returncode != 0:

link = "ld {}.o -o {}".format(filestem, filestem)
p =[link], shell=True)

There are no comments, and only a little white space to make it readable to me. I really didn’t feel like diving into either make or cmake. Hopefully I haven’t embarresed myself too much with either program.

What I’ve discovered so far is that there are a lot of 32-bit ARM assembly tutorials that won’t work at all with the Xavier. They just won’t assemble. But I am moving along a bit after this. I have Loeliger’s inner and outer interpreter coded, and two words in a dictionary. I’ll begin to post this effort shortly. As for why, well, why not? If nothing else, this hello program is 1,104 bytes long, which beats the size of Go’s basic hello world program by, what, four orders of magnitude?

There’s just something bracingly honest about writing in assembly that no other method of coding can approach.