part of: my Diploma Thesis

The current Data management Situation

The example of a small life enhancement

Let’s assume for example you have two unused function keys on your keyboard and would like to use those to dimm your smart desk lamp. Looking at what the hardware is capable of (theoretically), this would work as follows: The desktop computer could, whenever one of those keys is pressed, send a Frame with the new brightness value through it’s Ethernet connection to the Router, which would forward it to the smart lamp via wifi and the lamp then changes the voltage going to the LEDs acordingly. In practise, when we take the software running on the devices into account, implementing such a small life enhancement, would require way to much work, self written software and programming skill to make it work.

We would have to install software like AutoHotkey \cite{BibAutoHotkey}, that can react to any keypresses and run some code, that we would have to write ourselves, to update the brightnes value, by sending for example a http request to the IP of the lamp. We could also be unlucky and have a lamp, which only talks to the cloud of the manufacturer and has no http server through wich it allows changes of it’s brightnes value. This would require finding documentation of the APIs of the cloud or even reverse engineering them, to be able to tell the cloud, to change the brightnes value. The cloud could also add delays of up to multiple seconds.

All this trouble exists because: Firstly the operating system on the desktop computer does not have functionality to map two keys to increment and decrement some data value (in our case the brightness value of the lamp). Secondly there is no such magic “data value”, that can be modified on one device (the desktop computer) and have an effect on some other (the lamp).

Lack of general data management systems or standards

A long time ago, when Unix and the command line was how computers were used, there was the filesystem as the only data management API. Every application on the system just opened, read and wrote files, that mostly just contained readable text. One part of the Unix philosophy was “Everything is a file”, so even if you wanted to find out for example your IP address, you would read a file in a special filesystem. This allowed for a great flexibility for the user as to what programm they would use to view or modify the content of a file. And programms can be combined in the so called Unix pipes (\cite{BibUnixPipes}), which allowed for endless possibilities of what to do with the data of one or many files.

Nowadays, the data stored by programms is more complex than just text and also every Person has many computer systems in there life, makig it nececary to have data available across devices. Those added requirements, lead to every application implementing it’s data management in their own way. The problem with this is, that integrating with other appliations becomes a feature, that the developers need to implement and not a thing that’s possible from the beginning. Many modern apps proudly show on their websites all the other applications, that they can integrate with, but those are never all apps, which deal with the same kind of data. For example, in our HTL you have four sources, where homework items could be specified by a teacher, those being Microsoft Teams, Webuntis, Moodle and Schoolfox and it is not possible to show all of your homework in one single list. No you need to visit all three websites/apps and check manually.

Anoter example from IoT, where this problem really limits the possibilities of what you can do with IoT devices. My family has solar panels, an inverter, a battery, a heat pump and a electric car charging box installed at our home. Both the inverter and the battery connect to our WIFI network, that they use to talk to their respective clouds, allowing us to view data about them from anywhere we have an Internet connection. The problem however is that this situation does not allow, or makes it extreamly difficult to get this data in realtime into a local Grafana instance, that would be my tool of choiche to show and analyze the graphs of our Energy system. There is also no way, to tell the car charger to charge the car, whenever the inverter would be selling power to the grid. The car charger uses bluetooth to talk to the app that came with it and has the option to connect it to their cloud if we wanted to. The car even has LTE hardware to always be able to talk to it’s cloud and for our heat pump the company installed an extra LTE router in our home, so that the heat pump could talk to, you guessed it, their cloud. The data of our energy system is spread across five highly incompatible data management systems and there is no single place to view and modify all important data of all components. In my view this is a ridiculous situation and my family and everyone else who I tell about this agrees.

Every application having it’s own data management system allows developers to implement data management in a way, that works best for the use case of their application, but also many applications fail badly at a proper data management implementation. For example our Inverter at least in the past, sent it’s data into the cloud encrypted with AES128, but the passphrase was made of all ones (\cite{BibGoodweProto}) and also it takes five minutes for the newest data to be shown on the cloud’s website.

A look at existing Data Management Systems

ICloud

If you have all Apple devices and just turn on ICloud, you already get many things, that the Mize data engine would enable. A picture you take on your IPhone is immediately abailable on your Mac, you mirror the screen of your IPhone to your Mac and you see the battery levels of all your devices on every device. The problem with the Apple Ecosystem is however, that it only works with Apple devices and data can only be stored on Apple’s servers, Integrations with all other systems you use personally or for work is also non existent. Also Apples ecosystem is not particularly good at allowing creative combinations of tools by users, like Unix pipes do for example.

InstantDB

InstantDB is an awesome system, to manage data for a modern application. It has nice APIs to access and update data, does so in realtime and can deal with clients going offline for some time. All data is stored on a server, that can also be self hosted. The Problem with InstantDB is, that it is made for developers, to create apps with and not users wanting to store theeir personal data using it. Also it only has a client to be used with JavaScript, making it not usable in embedded scenarios on micro controllers.

Java Hibernate

Hibernate is a data management library for Java. It is only for Java and therefore already can’t fulfil the goals of the Mize data engine.

DotNET Entity Framework

The DotNET Entity Framework is the same as Hibernate, but for DotNET languages like C# or F#.

PPC Wiki

Explorer

The current Data management Situation (c2vi Diploma Thesis)