part of: my Diploma Thesis

Packaging and Distributing

Software Packaging is the process of getting all parts of a Software Application into a format, that can be used to run the Application (or use the library) on a certain user’s system. Distributing is about how you then get the application or library to the user. (\cite{BibSoftwarePackaging})

It is a goal of Mize to be able to run on any computer system, from large servers to laptops to micro controller powered smardHome devices. To run Mize under Windows, Linux or MacOS it should be a simple as downloading the right executable file and running it. No installation required and it should just work. To achieve this some thought needs to go into the packaging and distributing part of Mize.

As mentioned in \ref{one-library-for-all-languages}, the core logic code that is written in Rust will be used everywhere and by any programming language where Mize is used. This means, that if for example someone writes a program in python, which uses the Mize library, the machine code of the library may need to be different depending on system the python program is executed on. Things like this and many more need to be taken into account, when thinking about how to package and distribute Mize.

Differences between computer systems Mize should be able to run on

There are countless details/characteristics that can be different between a user system (also called the target system) and the system that is used to develop in this case the Mize platform. Also user systems might vary across many of said details. The software you write is only use full to the user, if it runs on whatever system the user has.

A broad summarily of such characteristics of a computer system follows.

OS Kernel

A kernel is the main component of an operating system and the only one that actually interacts with hardware. In order to be able to talk to hardware it needs to run with higher privilege than all other programs of the operating system and all user programs. A kernels job also includes memory management, giving device agnostic access to hardware devices, process management and file systems.

The two most used kernels are the Windows Kernel, which is the most popular option for desktop computers and Linux, the most popular option when it comes to server systems and smartphones. Two other frequently used kernels are XNU (standing for X not Unix), the one used and developed by Apple for all of their operating systems and the FreeBSD Kernel development by the FreeBSD Foundation.

The differences important for distributing our software are the apis used by user space programs to interact with the kernel, also called system calls or syscalls, and the file formats executable programs and libraries need to be in, so that their kernel can execute/load the machine code.

Another possibility that can be considered for distributing software is systems without a kernel or small micro and realtime kernels. Such scenarios are often found in embedded devices and are also called “bare metal” systems.

Syscalls

A userspace program has to use syscalls to do anything that is not modifying it’s own memory pages. Running another program, reading a file, using a network interface, allocating more memory, … are all things somehow done via a syscall. In order to invoke a syscall a program has to set some CPU registers according to the syscall spec and then execute the “syscall” instruction.

Unix based kernels like Linux, FreeBSD Kernel and XNU take the approach of having a small number of syscalls and using special files and ioctls to provide all needed functionality. This comes to around 100 syscalls. The Windows Kernel has multiple thousands of syscalls and they can change with every release of Windows. Unix kernels keep their syscalls the same forever, or at least stay backwards compatible by only adding extra ones.

File Formats

The file containing the executable code we distribute has to be in the right format for the kernel to load it into memory and start executing it as a new process or a dynamic library.

The Executable Linkable Format (= ELF) is what used by Linux and FreeBSD kernels. Windows and the former DOS have the Portable Executable format and the Apple kernel uses a file format called Mach-O.

CPU architecture

The CPU architecture defines what instructions a CPU can execute and how they need to be structured. Common examples for CPU architectures are x86, arm, avr and riscv, but many more exist.

OS userspace

Windows solves their ever changing syscalls by providing system libraries like for example kernel32.dll and user32.dll, which programs should use to interact with the kernel. Also when using Unix kernels syscall aren’t invoked by your code directly, there is a so called “standard library” for that. Besides having functions, that directly call syscalls standard libraries also have a lot of functionality that makes interacting with the kernel more friendly. Multiple of such standard libraries exist. The most used one is glibc, made as part of the GNU project. Others are Musl, bionic, newlib and some more.

Userspace wise Windows makes it easy for us, as there is only one Userspace. Linux however has over 1000 different distributions. Some are very similar, like Debian and Ubuntu and some do things completely differently, like Android or NixOS.

Available Hardware and Drivers

On some systems there will be special hardware for example rendering graphics or decoding a video stream. Our software should take advantage of that if available.

Implementing the distributing part

Now that we know what all needs to be taken into account for the packaging of this project, the following paragraphs document the implementation of the distributing and packaging part of Mize.

Cross Compiling

Cross Compiling is the process of building software that should run on a computer system that is different, in regards to one or more of the characteristics mentioned in \ref{differences-between-computer-systems-mize-should-be-able-to-run-on}, to the system used to compile the software.

Copying the source code of Mize onto every system we want to release for and build it on there, is just not practical. That would require to have one of every kind of system and take way too long. So cross compiling is certainly a thing we need to implement into the distributing process. Cross compiling usually means installing some sort of cross toolchain and running that to compile the source code. Every system we want to compile for, will of course need it’s own toolchain installed. This imperative approach firstly takes a lot of manual effort and secondly it often happens, that all the installed cross toolchains and also toolchain local to the system interfere with one another, which can produce unexpected errors during compilation. Those errors then have nothing to do with something being wrong in any of the compiled source code, but rather in the toolchains and how they were installed.

Nix

Nix is the name of a package manager and a domain specific language, that is used by this package manager to define packages.

The first major difference to most other package managers is that a package is not defined as a set of attributes like the version, the name, list of dependency packages. With Nix however a package is a function in this Nix language. The parameters of this function are, every dependency, compilation options, the compiler and many “build functions” special to the targeted system and the language of the project. The functions that defines a package then calls one of those “build functions” passing it things like name, version, source-code and other metadata of the package.

With most package managers apart from Nix a package defines the path it is installed into. The package manager simply runs the install code of the project using for example make install for make based projects. This method leads to a significant problem when you’d want to install two different versions of a package, both versions will want to install files to the same path. The Nix package manager installs a package only into a path that is unique to that package. Such a path looks for example like this: /nix/store/<hash>-<name>-<version>/, where <hash> is a hash of all used “inputs” (the arguments passed to “build functions”), therefore being only the same if you install a package with exactly matching dependencies, compiler options, target system, and so on.

How Nix helps with cross compiling

Nix runs all builds in a sandboxed environment, where only the needed programs, dependencies and toolchains are installed. This fully eliminates errors that arise because of the toolchain, how it is installed or interference with other installed toolchains or programs. All the details off this sandboxed environment and what toolchain and dependencies are installed is decoratively defined in the package definitions using the Nix DSL. It is also worth noting, that nix uses hashing, to make sure to use exactly the same exact versions of toolchain, dependencies, programs, making those environments fully reproducible.

The Module System

Only the core parts of the Mize platform are in the Mize library itself. All other functionality will be provided by external modules. Also a type can provide a module, which contains functionality for this type, for example code, to check if a update is valid. A module is a folder containing a mize_module.nix file, in which you define with the Nix DSL, what the module does and how to build it.

The distributing process

After the nix definitions are written, it becomes possible to run one command, that builds all versions of Mize and all modules for all computer systems. The command is nix build github:c2vi/mize#dist and can be run on any Linux system, with the only thing, that needs to be installed being Nix. This creates a path in the Nix store, which can then simply be rsync’d onto a webserver, where users can download the correct executables and other files. There is a shell script in the Mize repository called “deploy”, which does basically that and also checks, if the path is successfully created by the nix build invocation with [[ "$path" != "" ]] and only then runs the rsync command. ocih is the hostname of the host where the webserver is located. This host can only be accessed, when this script is run on my local machine, where it can access the right private key.

path=$(nix build .#dist -L -v --print-out-paths $@)
[[ "$path" != "" ]] && rsync -rv $path/* ocih:host/data/my-website \
--rsync-path="sudo rsync"

PPC Wiki

Explorer

Packaging and Distributing (c2vi Diploma Thesis)