Once your product leaves the lab and enters the real world, it will become important to have an remote update concept in place. No matter how good your code or how lean the application is, there will come times when it becomes necessary to adapt to a changing environment, to fix bugs or to simply reconfigure your application.
Updates might involve the ‘application.bin’ file or/and further files to be stored in the on-board serial flash memory file system. E.g. configuration files that your software might need for operation. The following diagram illustrates the major steps:
Update concept – General requirements
An update concept has to meet numerous requirements that need to be considered. Some of them will be discussed in the following:
In contrast to consumer handled devices like laptops, smart phones etc., IoT systems often run autonomously in remote areas where end users can not control, monitor or intervene into the update process. This makes it necessary that updates perform independent and insensitive against interruptions, loss of signal or power losses. This must be considered during any time and any step of the entire update processes (including flash memory access time, downloads etc.). This requirement implies that a robust system must be able to restart successfully at any time, as well.
Further info: All involved libraries are designed to consider exactly the requirements mentioned. E.g. the SerialFlash – Library introduces ‘record-set’ files that ensure correctly and fully written files. The Update – Library handles the mechanism to switch safely between new and previous applications.
An essential capability of any remote update system is to ensure that a properly functioning application is always available on the device. Even if an update fails or the new application does not behave as desired, the system must be able to “fall back” to the previous version. This mechanism is often referred to as “rollback.” It assumes that the system has enough “memory” to store at least two complete applications and accompanied files. In the case of dice devices, this is accomplished by the spi-flash memory that each dice has on board. A typical rollback system would work as follows: Before the update system “flashes” the new application, it creates a “recovery image” of the current system and stores it in the spi-flash memory chip. After the reset, the bootloader flashes/activates the new application (see Update – Library). The new application now starts and connects to the update server again. This proves that it can access the Internet and that it is reachable and working correctly. If this check fails, the bootloader falls back to the “recovery image” of the previous version of the application. The device now runs the original application again. It is at the judgment of the server whether to re-provision the update or declare the update attempt as failed. The SysSync system implements this through a job counter that limits the number of times a device is presented with an update job.
Integrity, compatibility and sanity
Update packages consist of an application and accompanying files that must be checked for correctness before they are used. Typical checks verify the correct size, content (e.g. using CRC), compatibility, name and version. Applications are compiled to run correctly on a specific target hardware type only (e.g. dice-IO or dice-WiFi). An update system must ensure the compatibility of applications that are transferred to the respective target types. The dice system implements compatibility checks at several points of the update process to avoid incompatibilities. File integrity is verified for each downloaded artefact by 32bit CRC checksums. Sanity is declared by the SysSync system when the new application manages to report to the server that all files and the application of the update-job finally match the ones on the device. Alternatively the Update – Library provides the function i_update_confirm() which gives developers the freedom to implement their own sanity checks instead.
Security and authentication
Server, browser clients as well as the devices may use secure TLS communication for any kind of data exchange with the server. Device authentication via device unique credentials make sure that a single compromised device does not leave the remaining fleet vulnerable. A sound registration mechanism protects against fake device- and user-requests so that only the right owner can gain control over his devices and account.
Versions of applications and files are usually managed with some kind of naming convention and versioning tools such as Subversion and Git. An update system must also take the versioning aspect into account, since it has to store and provide update artefacts (files/applications) for different targets in possibly various versions. But at the same time, it must make sure that the server storage space is not wasted by duplicates or abandoned files. Another aspect to consider is the configuration management which ensures that only functional and well tested packages are deployed and that they are clearly identifiable. However, update servers must be able to reliably store, distinguish and provide update components for different targets, different customers and in different versions at the same time. Note: The dice tool chain provides an optional automatic versioning schemata which is also supported by the SysSync server. Further details can be found here.
Maintaining device specific configurations
In most applications, devices have different configuration values that are unique or depend on their environment. Examples are WiFi SSID/passwords, APN settings, SIM PIN number, cloud URLs or device credentials, but also application settings (e.g. CAN logging settings) can be important not to be lost. These settings must not be touched by the update process. The memory organization of dices takes these requirements into account. The BOOTLOADER SETTINGS – Library allows the developer to store user-data in an bootloader-protected area that would remain untouched during system updates. Another option would be to use spi-flash files to store device specific data. The library FlashStorage, on the other hand, provides functions for storing data in the application memory area, which is overwritten during a system update but is also non-volatile over a power cycle.
Sound device commission-, registration- and handover-concept
It is obvious that there is a significant difference between the update of individual devices by a developer compared to the situation of a live system in a ‘real-life’ environment. A relevant system must therefore provide processes to program, manage and assign devices on a large scale as well as on individual devices or group of devices. Natural processes of a product life cycle are often neglected but are vital to be supported, for example the handover of devices from one user to the next (e.g. when selling a device). SysSync provides two ways of registration and device assignment. This is either with the help of the dice-config tool during commissioning or by specifying the particular user-account within the application code for automatic registration requests. See details in the description of
i_doSysSync() and the ‘Tutorial SysSync‘. Once a device should be sold on, the previous owner (e.g. the system integrator) might want to hand over the device maintenance to the account of the future owner as well. During the handover process, it is important that the device is never left without an assigned account.
Delta updates for minimal data traffic and overhead
For most of a product’s lifetime, the application is used for its intended purpose. System updates are rarely performed. This ratio must also match the amount of the system resources and the cost of operation of an update system. In particular, the data transfer time and size, the resulting regular costs and impact on the actual application must be kept as low as possible. On the other hand, the update concept must make sure that the system regularly reports its status and checks for update jobs. The SysSync library achieves this with a single http post request (every 15 minutes by default). Depending on the system state (e.g. number of files in the file system), this http post is typically less than hundred bytes in size and takes only a second of transmission time. When it comes to downloading update components, the system takes care that files that have already been successfully downloaded to the device are not transferred again (e.g. after some interruptions or errors). Only the files that are still missing for the update job are requested and downloaded in the following process steps.
Minimal disruption to the running system
Any update client software (e.g. the SysSync library) must minimise any disruption of the main application during regular operation as much as possible. This is why the SysSync library is designed non-blocking and regular server polling would only happen rarely and only if the application ‘decides’ to provide access to the internet. The library would inform the application about requirements and upcoming process steps beforehand, so that the application can react accordingly. E.g., ‘establishing an internet connection” so that the update library can report and poll the server; upcoming modifications of the file system (so that the application may stop accessing it) or that a reset is about to be triggered by the update library. If an update job is due to be processed, the necessary file/app downloads are broken down into small chunks of data transfers so that the application would not be impacted significantly neither.
Intuitive creation, maintenance and monitoring of update jobs, files, devices and groups.
A useful update system has to be clear, appealing and intuitive to use. The user must be able to easily manage his devices and create updates without in-depth knowledge of the underlying system. At the end of the day, a system is only helpful if it is actually used and becomes part of the workflow.
Data forwarding & multi channel distribution for decentralised systems
The dice system approach is decentralised, which enables the system designer to tailor the system with the required functionality at the right locations. This also means that not all components may have their own means of communication to the Internet. An update system that has the claim to serve decentralised architectures must solve this issue, too. It would have to tunnel the required data traffic through the CAN bus to a device that serves as a ‘gateway’ to the internet. The CANIP library addresses this requirement by letting radio modules provide internet access to other CAN bus components. The SysSync update library works with numerous communication technologies to connect the update server and one of which is CANIP. This enables every dice device (e.g. dice-CPU) to receive updates as long as there is at least one device (e.g. dice-WiFi) on the CAN bus that provides them with internet access.
Scheduled / phased deployments
Scheduling a software update deployment to start at a specific date and time of the day is an important aspect of an update system. This feature helps to limit the disruption of the end users application further. Typically, updates would be scheduled for nights a/o at weekends. Scheduled deployments are also used for phased rollouts. This would eliminate large-scale failures by dividing a deployment into time-delayed phases, with each phase containing a share of the devices to be updated.
Controlled retry of deployments
The server will automatically repeat the deployment for devices where it fails for a predefined number of times. This aspect is particularly important in conjunction with the rollback mechanism. Limiting the number of retries prevents a failing device from constantly switching back and forth between the old and new applications. The update attempt would abort after a certain number of retries and inform the account holder.
There are circumstances that might make it necessary be aware of the lastest physical location of the target devices. The SysSync system provides this function if configured to do so. Positioning information are considered a matter of privacy. This is why this function must be activated by the target code developer. It gives the developer the freedom to choose if his application shall reveal the positioning information to the update server or not. Apparently the target system would have to have the GPS information available on the CAN bus (DICE_GPS- Library) or comes with a receiver on board (dice-IoT, -GNSS etc.)
Keeping the admin up to date
Deploying software updates is typically not a fully time job but is most of the time managed concurrently to development activities. It goes without saying that there must be a way to inform the user about important activities regarding of open update jobs without constant monitoring. SysSync does this (if you like) via emails that inform about the actual start, completion or failure of your updates.
Another helpful feature to quickly inform the admin about the health state of his devices is integrated in the overview page that shows the devices ‘heart beats’, which is essential the last time the device reported to the server.
The good news: This is all done with a single line of code