Tribblix updates and https The Trouble with Tribbles...

One good thing to have happened recently is the rise of Let's Encrypt, bringing https to all websites without all the hassle you previously had to go through to get a certificate.

One not quite so good event recently was the switch by Let's Encrypt to certificates signed by their own ISRG X1 root, and more excitingly the expiry of the prior DST Root CA X3 signing certificate.

My experience of this is that most things just worked, but I'm still seeing odd cases where clients can't connect. Generally, browsers work just fine; CLI tools are a bigger issue.

This might be due to a couple of issues. Sometimes the software itself guesses wrong (older openssl 1.0.2 for example); sometimes the system's CA bundle of trusted root certificates needs updating.

For a while now, the Tribblix package repositories have been served over https and the zap tool for package management has been configured to use https. There are cases where it falls foul of the above issues.

This might occur on older Tribblix releases - I've seen this on m22, for example.

It turns out that curl fails, but wget works. Again, that's an example of the inconsistency in behaviour that I see. You need to update the CA bundle on m22, but if the package update tool is broken that's a bit tricky.

There's an ugly hack, though, because zap will try wget if it can't find curl. So just move curl out of the way temporarily:

mv /usr/bin/curl /usr/bin/curl.t
zap refresh
zap update TRIBca-bundle
mv /usr/bin/curl.t /usr/bin/curl

and you should be good to go again.

There's another way, of course: edit the *.repo files in /etc/zap/repositories to change the URL from https to http. That's not particularly recommended (although the packages are signed and the signatures are checked).

One thing that last hack demonstrates is the value in using simple text files.

Inside zone installation The Trouble with Tribbles...

 How do zones actually get put together on Solaris and illumos? Specifically, how does a zone get installed?

 There are various type of zones. The nomenclature here is a brand. A zone's brand defines how it gets installed and managed and its properties. Often, this is mapped to a zone template which is the default configuration for a zone of that type or brand.

 (By the way, this overlap between template and brand can be seen in the create subcommand of zonecfg. You do "create -t SUNWlx" to build a zone from a template, which is where the -t comes from. It's not the create that sets the brand, it's the template.)

 The templates are stored as xml files in /etc/zones. As are the configured zones, which is a bit confusing. So in theory, if you wanted to generate a custom template to save adding so much to your zonecfg each time, you could add your own enhanced template here. The actual zone list is in /etc/zones/index.

 In fact, Tribblix has template zones, which are sparse-root zones built from a different image to the global zone. They are implemented by building an OS image that provides the file systems to be mounted read only, and a template xml file configured appropriately.

 One of the things in the template is the brand. That maps to a directory under /usr/lib/brand. So, for example, the TRIBsparse template in /etc/zones/TRIBsparse.xml sets the brand to be sparse-root, in addition to having the normal lofs mounts for /usr, /lib, and /sbin that you expect for a sparse-root zone. There's a directory /usr/lib/brand/sparse-root that contains everything necessary to manage a sparse-root zone.

 In there you'll find a couple more xml files - platform.xml and config.xml. A lot of what's in those is internal to zones. Of the two, config.xml is the more interesting here, because it has entries that match the zoneadm subcommands. And one of those is the install entry. For TRIBsparse, it is

 /usr/lib/brand/sparse-root/pkgcreatezone -z %z -R %R

 When you invoke zoneadm install, this script gets run, and you get the zone name (-z) and zonepath (-R) passed in automatically. There's not much else that you can specify for a sparse root zone. If you look at the installopts property in config.xml, there's just an h, which means that the user can specify -h (and will get the help).

 For a whole-root zone the install entry is similar, but installopts is now o:O:h - this is like getopts, so it's saying that you can pass the -o and -O flags, and that each must have an argument. These flags are used to define what overlays get installed in a whole-root zone. Having the installopts defined here means that zoneadm can validate the install command.

 So, for a given brand, we've now seen from config.xml what command will be called when you install a zone, and what options it's allowed.

 The point is that there's nothing special here. You can build a custom brand by writing your own install script, and if you need to pass arguments to it you can easily do so as long as you set installopts to match. When building all the zone brands for Tribblix, that's all I did.

 To reiterate, the install script is completely open. For existing ones, you can see exactly what it's going to do. If you want to create one, you can have it do anything you like in order to lay down the files you want in the layout you want.

 As a crazy example, a long time ago I created a brand that built a sparse-root zone on a system using IPS packaging.

 There's a little bit of boilerplate (if you're going to create your own brands, it's probably easier to start with a copy of an existing one so you pick up the common actions that all zone installs do), but after that, the world's your oyster.

 Consider the alien-root zone in Tribblix. If you look at the installer for that, it's just dumping the contents of an iso image, tarball, or zfs send stream into the zone root. It does some cleanup afterwards, but generally it doesn't care what's in the files you give it - you can create an arbitrary software installation, tar it up, and install a zone from it.

 (In fact, I probably won't create more native zone types for Tribblix - the alien-root is sufficiently generic that I would extend that.)

 This generality in scripting goes beyond the install. For example, the prestate and poststate scripts are called before or after the zone transitions from one state to another, and you can therefore get your zone brand to do interesting things triggered by a zone transitioning state. One of the coolest uses here is the way that OmniOS implements on-demand vnics - the prestate script creates a vnic for a zone before a zone boots, and the poststate script tears it down after it halts. (Tribblix uses zap to manage vnics outside of zoneadm, so they're persistent rather than on-demand, it's just a different way of doing things.)

 As you can see, you aren't limited to the zone types supplied by your distribution. With enough imagination, you can extend zones in arbitrary ways.

Are software ecosystems a good thing? The Trouble with Tribbles...

 One way to judge the health or strength of a product might be to look at the ecosystem surrounding that product. But is this diagnostic?

 Note that there are several concepts here that are similar to the ecosystem. I'm not referring to the community, those people who might use or support the product. Nor am I talking about a marketplace, which is a source of artefacts that might be consumed by the product. Those are important in their own right, but they aren't what I mean when I'm talking about an ecosystem.

 No, an ecosystem is the set of other services or software that spring up to support or integrate with the product.

 There's one immediate problem here that's obvious if you think about it. Much of the ecosystem thus exists to address flaws or gaps in the product. Something that is more polished, more mature, and more finished will provide fewer opportunities for other products to add value.

 What this means, then, is that a thriving ecosystem is often a sign of weakness and immaturity, not strength. A good product will not need the extras and hangers on that come with an ecosystem.

 The notion of an ecosystem is tied in with that of MVP - Minimum Viable Product. The current trend is to launch a startup with just an MVP, rely on first mover advantage, and hope to actually finish the offering at a later date. By definition, an MVP cannot be complete, and will need a surrounding ecosystem in order to function at all. This is much more common now than in the past, when products - especially proprietary products - were not launched until they were in some sense done.

 Over time, too, an ecosystem will - or should - naturally diminish, as bugs are fixed and missing features filled in. The partners in the ecosystem will get frozen out, as their offerings become irrelevant (think ClusterHQ).

 As an example from the past, consider the ecosystems that built up around Windows and DOS. Whole industries were built on things like TCP stacks and centralized nameservices and storage (PC-NFS, even Netware). These were products reliant on fundamental failings of the product they supported. (Don't even get me started on antivirus software.)

 Fast forward, and I can't be the only one to recognise the CNCF landscape as a disaster area.

Run Docker images on SmartOS ~drscream

This feature is available on the SkyLime SmartOS Version because we merged the changes from an existing issue into our branch to support the Docker Registry Version 2. This has been done because most of the existing Docker images only using version 2, which result in less usable images if you only support version 1. With this change no docker version 1 is supported any more, which is the biggest drawback if you’ve already version 1 images.

SmartOS add delegate dataset later to zone ~drscream

Maybe you missed to add a delegate dataset to one of your SmartOS or LX-branded zone and would like to add it later. This is possible with zonecfg, but you’re be warned this is only a workaround and not based on the official SmartOS documentation.

Memory and CPU usage tools on Illumos ~drscream

This is only a small overview of memory and CPU usage tools on Illumos. Mostly I forgot the commands so it’s like a personal reference page :-)

Migrate ZFS pool to a different server ~drscream

I hope you run your server with ZFS already :-)

Samba4 on SmartOS ~drscream

Samba version 4 (Samba4) is available in SmartOS since pkgsrc release 2014Q4. Be careful it’s still build from the WIP (work in progress) repository.

Use GnuPG to sign pkgsrc packages ~drscream

The new pkgsrc-2014Q4 release provide signed packages for SmartOS. A lot of details about the release is provided by an article from Jonathan Perkin, pkgsrc-2014Q4: LTS, signed packages, and more.

Rotate DKIM in mail.core.io environment ~drscream

Rotate the DKIM keys every half a year or maybe more often is recommended. On our mail.core.io environment we’ve multiple nodes running which could be a pain in the ass to do it manually.

Use delegate dataset for samba zone ~drscream

This is a small but maybe helpful improvement to my blog post Setting up Samba on SmartOS. Thanks to wiedi who reviewed the blog post a bit. One of the many benefits using Samba in a SmartOS zones is that you have ZFS.

Setting up Samba on SmartOS ~drscream

This article is based on Jonathans awesome Setting up Samba on SmartOS article, so thanks a lot. His blog post is more than two years old but still provided to all people who ask about setting up Samba on SmartOS. But things have changed, Jonathan implemented SMF support to pkgsrc which makes it a lot easier.

Debug SmartOS usr files without rebuild usr.lgz ~drscream

Maybe you’re lazy like I’m? And you wouldn’t like to rebuild SmartOS to debug some changes in /usr? First you will be warned, this is again a workaround and durty hack only for debugging!

Restoring the python dtrace ustack helper OmniOS Community Edition

Prior to version r151034, the Python versions shipped with OmniOS included a patch which added support for a small number of dtrace probes around function entry and exit, and also a dtrace ustack helper. A ustack helper allows a stack trace examined in the context of a dtrace probe to be annotated with useful information; in the case of python, information about the python program stack.

When dtrace support was integrated into the official cpython release of Python, in version 3.6, the ustack helper was omitted. This meant that when OmniOS upgraded to Python 3.7 along with the move to using the built-in dtrace support, the ustack helper was no longer available. We recently revisited this and have just completed restoring the feature to Python 3.10 in OmniOS bloody, which will be the basis for the next stable release in May 2022. This is mostly derived from the original Solaris patch for python 3.5. The python patch for OmniOS is being maintained in a github branch.

We have opted to include this in a separate debug variant of the package since there are some performance overheads in providing the helper, even if it is not used. For it to work reliably, python itself needs to be built without optimisations, every loaded module needs to have a second copy of the line number information stored in a format that is usable by dtrace, and every function call needs to go through another stack frame to ensure that the necessary information ends up on the stack in a place that dtrace can find it. There’s a little more on this in John Levon’s blog post on the original python dtrace implementation.

As an example, I was recently debugging a problem with the OmniOS packaging system unexpectedly removing a directory. Reaching for dtrace to check when the rmdir system call is used is not particularly helpful since you only see the stack trace from the cpython process.

        % pfexec dtrace -n 'syscall::rmdir:entry{trace(copyinstr(arg0));jstack()}' \
    -c 'pkg image-create -f /tmp/testimg'

  1  246  rmdir:entry   /tmp/testimg/.org.opensolaris,pkg/publisher
   libc.so.1`_syscall6+0x1b
   libpython3.10.so.1.0`os_rmdir+0x102
   libpython3.10.so.1.0`cfunction_vectorcall_FASTCALL_KEYWORDS+0x61
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x57b4
   libpython3.10.so.1.0`_PyEval_Vector+0x45
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x57b4
   libpython3.10.so.1.0`_PyEval_Vector+0x45
   libpython3.10.so.1.0`method_vectorcall+0x8c
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x1663
   libpython3.10.so.1.0`_PyEval_Vector+0x45
   libpython3.10.so.1.0`_PyObject_FastCallDictTstate+0x108
   libpython3.10.so.1.0`_PyObject_Call_Prepend+0x73
   libpython3.10.so.1.0`slot_tp_init+0x89
   libpython3.10.so.1.0`type_call+0x10b
   libpython3.10.so.1.0`_PyObject_MakeTpCall+0x126
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x6030
   libpython3.10.so.1.0`_PyEval_Vector+0x45
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x1663
   libpython3.10.so.1.0`_PyEval_Vector+0x45
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x3ef
   libpython3.10.so.1.0`_PyEval_Vector+0x45
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x30b0
   libpython3.10.so.1.0`_PyEval_Vector+0x45
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x3ef
   libpython3.10.so.1.0`_PyEval_Vector+0x45
   libpython3.10.so.1.0`PyEval_EvalCode+0x96
   libpython3.10.so.1.0`run_eval_code_obj+0x56
   libpython3.10.so.1.0`run_mod+0x6d
   libpython3.10.so.1.0`pyrun_file+0x81
   libpython3.10.so.1.0`_PyRun_SimpleFileObject+0x395
   libpython3.10.so.1.0`_PyRun_AnyFileObject+0x8b
   libpython3.10.so.1.0`Py_RunMain+0x3d6
   libpython3.10.so.1.0`Py_BytesMain+0x3c
   python3.10`_start_crt+0x87
   python3.10`_start+0x18

With this new update, the debug python variant with the ustack helper can easily be switched in.

        % pfexec pkg change-variant debug.python=true
            Packages to change:  1
     Variants/Facets to change:  1
       Create boot environment: No
Create backup boot environment: No

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                                1/1     2556/2556    14.5/14.5      --

PHASE                                          ITEMS
Installing new actions                           2/2
Updating modified actions                  2554/2554
Updating package state database                 Done
Updating package cache                           0/0
Updating image state                            Done
Creating fast lookup database                   Done
Reading search index                            Done
Building new search index                    689/689

and now the same stack trace is annotated with the python program information (the indented parts shown in square brackets).

         10  246  rmdir:entry   /tmp/testimg/.org.opensolaris,pkg/publisher
   libc.so.1`_syscall6+0x1b
   libpython3.10.so.1.0`os_rmdir+0x282
   libpython3.10.so.1.0`cfunction_vectorcall_FASTCALL_KEYWORDS+0x5b
   libpython3.10.so.1.0`_PyEval_EvalFrameDefaultReal+0xaf11
     [ /usr/lib/python3.10/shutil.py:708 (rmtree) ]
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x1c
   libpython3.10.so.1.0`_PyEval_Vector+0x48
   libpython3.10.so.1.0`_PyEval_EvalFrameDefaultReal+0xaf11
     [ /usr/lib/python3.10/vendor-packages/pkg/client/image.py:1971 (__set_dirs) ]
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x1c
   libpython3.10.so.1.0`_PyEval_Vector+0x48
   libpython3.10.so.1.0`method_vectorcall+0x185
   libpython3.10.so.1.0`_PyEval_EvalFrameDefaultReal+0x6dd2
     [ /usr/lib/python3.10/vendor-packages/pkg/client/image.py:711 (__init__) ]
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x1c
   libpython3.10.so.1.0`_PyEval_Vector+0x48
   libpython3.10.so.1.0`_PyObject_FastCallDictTstate+0xfa
   libpython3.10.so.1.0`_PyObject_Call_Prepend+0x112
   libpython3.10.so.1.0`slot_tp_init+0xd9
   libpython3.10.so.1.0`type_call+0x8f
   libpython3.10.so.1.0`_PyObject_MakeTpCall+0xa1
   libpython3.10.so.1.0`_PyEval_EvalFrameDefaultReal+0x9db5
     [ /usr/lib/python3.10/vendor-packages/pkg/client/api.py:6749 (image_create) ]
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x1c
   libpython3.10.so.1.0`_PyEval_Vector+0x48
   libpython3.10.so.1.0`_PyEval_EvalFrameDefaultReal+0x6dd2
     [ /usr/bin/pkg:6284 (image_create) ]
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x1c
   libpython3.10.so.1.0`_PyEval_Vector+0x48
   libpython3.10.so.1.0`_PyEval_EvalFrameDefaultReal+0x756f
     [ /usr/bin/pkg:8042 (main_func) ]
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x1c
   libpython3.10.so.1.0`_PyEval_Vector+0x48
   libpython3.10.so.1.0`_PyEval_EvalFrameDefaultReal+0x5326
     [ /usr/bin/pkg:6191 (handle_errors) ]
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x1c
   libpython3.10.so.1.0`_PyEval_Vector+0x48
   libpython3.10.so.1.0`_PyEval_EvalFrameDefaultReal+0x756f
     [ /usr/bin/pkg:8182 (<module>) ]
   libpython3.10.so.1.0`_PyEval_EvalFrameDefault+0x1c
   libpython3.10.so.1.0`_PyEval_Vector+0x48
   libpython3.10.so.1.0`PyEval_EvalCode+0x91
   libpython3.10.so.1.0`run_eval_code_obj+0x54
   libpython3.10.so.1.0`run_mod+0x6c
   libpython3.10.so.1.0`pyrun_file+0x80
   libpython3.10.so.1.0`_PyRun_SimpleFileObject+0x124
   libpython3.10.so.1.0`_PyRun_AnyFileObject+0x44
   libpython3.10.so.1.0`Py_RunMain+0x451
   libpython3.10.so.1.0`Py_BytesMain+0x57
   python3.10`_start_crt+0x87
   python3.10`_start+0x18

Here are those lines extracted, and it’s easy to see that the place to start looking is around line 1971 in the __set_dirs function of image.py.

        [ /usr/lib/python3.10/shutil.py:708 (rmtree) ]
[ /usr/lib/python3.10/vendor-packages/pkg/client/image.py:1971 (__set_dirs) ]
[ /usr/lib/python3.10/vendor-packages/pkg/client/image.py:711 (__init__) ]
[ /usr/lib/python3.10/vendor-packages/pkg/client/api.py:6749 (image_create) ]
[ /usr/bin/pkg:6284 (image_create) ]
[ /usr/bin/pkg:8042 (main_func) ]
[ /usr/bin/pkg:6191 (handle_errors) ]
[ /usr/bin/pkg:8182 (<module>) ]

To switch back to the production version of python:

        % pfexec pkg change-variant debug.python=false

The three strands of Information Technology The Trouble with Tribbles...

How are IT departments structured? I've seen a variety of ways to do this. It depends on the individual business, but over the years I've come up with a way to think about this.

When thinking about Information Technology (IT), it naturally splits into 3 separate strands:

IT for the business

This is the provision of facilities for HR, Finance, Sales, and the like; basic facilities for the organisation to operate as a business

IT for the employee

This is the provision of systems and tools for employees to be able to work at all; laptops/desktops/mobile devices, and communications systems such as telephony and email, together with a way for staff to store and collaborate on documents

IT for the customer

This is the provision of services that your customers use, whether that's a product you sell in its own right, or as a mechanism to sell other products

The relative importance of these 3 strands depends on the nature of the business, of course. And very small organisations might not even have all 3 strands in any meaningful sense.

Structurally, there are two senior roles that an organisation might have, the CIO and CTO.  And the way things would naturally be laid out is that the CIO looks after IT for the business and IT for the employee, while the CTO gets IT for the customer.

Splitting things this way works because the characteristics of the strands are quite different. The responsibilities of the CIO are inward-facing, those of the CTO are outward-facing. The work of the CIO is about managing standardised commodities, while the CTO's role is to provide differentiation. Polar opposites, in a way.

There's a third role, that of the CISO, responsible for information security. This is slightly different in that it cuts across all 3 strands. As such, if you have both a CIO and a CTO, it isn't entirely obvious which of the two, if either, should take on the CISO role.

Given the different nature of these 3 strands, where does the IT department (loosely defined as those people whose job is IT) fit? Should you even have one? The job requirements for the 3 strands are sufficiently different that having different IT teams for each strand would seem to make an awful lot of sense, rather than a central IT department. And the IT team for each strand reports to the CIO or CTO as appropriate. In particular, having a product developed in the CTO part of the organisation and then thrown over the wall to be run by an operations team in the CIO organisation is one of the organisational antipatterns that never made any sense and was a major driver for DevOps.

Thus, when structuring the delivery of IT in an organisation, considering the divergent needs of the 3 different IT strands ought to be taken into account. Worst case is a single department that standardises on the same solution to deliver all 3 strands - standardisation is a common refrain of management, but what it really means here is that at least 2 strands (if not all 3) are delivered in a sub-standard way, often in a way that's actually completely unsuitable.

There is a central IT function that does cut across all 3 strands, in the same way that a CISO does at the management level. Which is a compliance function or security office. But for most other functions, you're really looking at providing distinct deliveries for each strand.

The cost of cloud The Trouble with Tribbles...

Putting your IT infrastructure into the cloud seems to be the "in" thing. It's been around for a while, of course. And, like most things related to IT, there are tradeoffs to be made.

My rough estimate is that the unit cost of provisioning a service on AWS is about 3 times that of a competent IT organization providing a similar service in house. Other people have come to the same number, and it hasn't really changed much over the last decade. (If you don't think 3x is right, consider what AWS' gross margin is.)

Some services offered by AWS deviate from that simple 3x formula. The two obvious ones are network costs, which as Cloudflare have argued are many times higher than you would expect, and S3, which you're going to struggle to beat. (Although if you're going to use S3 as a distribution site then the network costs will get you, think about Wasabi for that.)

And yet, many organizations move to the cloud to "save money". I'm going to ignore the capex versus opex part of that, and simply note that many IT organizations have in-house operations that are neither very efficient nor cost-effective. In particular, traditional legacy IT infrastructures are ridiculously overpriced. (If you're using commercial virtualization platforms and/or SAN storage, then you're overpaying by as much as a factor of 10, and getting an inferior product into the bargain - so that while many organizations could save a huge amount of money by moving to the cloud, they could save even more by running their internal operations better.)

Often the cost saving associated with a migration - not just cloud, this applies to other transitions too - comes about not because the new solution is cheaper, but because a migration gives a business leverage to introduce better practices. Practices that, if used for your on-premise deployments, would save far more than the cloud ever could. Sometimes, you need to do an end run round an entrenched legacy IT empire.

Another consideration is that the cloud has often been touted as something where you pay for what you use, which isn't always quite correct. For many services, you pay for what you configure. And some services are nowhere near as elastic as you might wish.

Capacity planning doesn't go away either, it's actually more important to get the sizing right, and while you can easily buy more capacity, you have to ensure you have the financial capacity to pay the bills.

Note that I'm not saying you should always run your systems on-premise, nor that it will always be cheaper.

Below a certain scale, doing it yourself isn't financially beneficial. There's a minimum configuration of infrastructure you need in order to get something that works, and many small organizations have needs below that. But generally, the smaller providers are likely to be a better option in that case than full-on cloud offerings.

Having the operational capability to support your infrastructure is also crucial. If you're going to support your own hardware, you really need a team, which is going to set a minimum scale at which operations are worthwhile.

This becomes even more true if you need to deploy globally. It's difficult to do that in-house with a small team, and you have to be pretty large to be able to staff multiple teams in different geographies. A huge advantage of using the cloud for this is that you can deploy into pretty much any location without driving your costs insane. Who wants to hire a full team in every country you operate in? And operationally, it's the same wherever you go, which makes things a lot easier.

In recent times, the Coronavirus pandemic has also had an impact. End user access to colocation facilities has been restricted - we've been able to do repairs, recently, but we've had to justify any datacenter visits as essential.

There are certain workloads that are well matched to the cloud, of course, Anything highly variable, with spikes above 3x the background, will be cheaper in the cloud where you can deploy capacity just for the spike than it would be in house where you either overprovision for peak load or accept that there's a spike you can't handle.

The cloud is also great for experimentation. You can try any number of memory and CPU configurations to see what works well. Much easier than trying to guess and buying equipment that isn't optimal. (This sort of sizing exercise is far less relevant if you have decent virtualization like zones.)

You can even spin up a range of entirely different systems. I do this when testing, just run each of a whole range of Linux distros for an hour or so each.

What the above cases say is that even if the unit cost of cloud resources is high, the cloud gives you more of an opportunity to optimize the number of units. And, when it comes to scaling, this means the ability to scale down is far more important than the ability to scale up.

I use AWS for a lot of things, but I strongly regard the cloud as just another tool, to be used as occasion demands, rather than because the high priests say you should.

Testing a C Library With Python Staring at the C

It’s still common for a systems library to be written in the default lingua franca, C, although Rust is encroaching, for good reasons.

However, when it comes to testing, things get tedious quickly: writing unit or component tests in C is a slow, bug-prone exercise. With libvfio-user, after fixing too many bugs that were due to the test rather than the test subject, I decided it would be worth looking at alternative approaches. The aim was to reduce the time it takes to develop unit/component tests.

Up until this point, we’d been using ctest, along with cmocka when we needed to mock out certain functions (such as socket handling). Leaving aside my strong feelings on these tools, this was rather unsatisfactory: libvfio-user effectively implements a (UNIX) socket server, but we weren’t actually testing round-trip interactions for the most part. In terms of code coverage, very little useful could be done via this unit testing approach, but the “sample” client/server was tedious to work with for testing purposes.

Python-based testing

After a quick proof of concept, it became clear that using Python would be a great choice to cover most of our testing needs. libvfio-user doesn’t ship with any client bindings, and, given that the main clients are qemu, cloud-hypervisor and SPDK, Python bindings would be of dubious utility.

As a result, we decided against “proper” Python bindings, auto-generated or otherwise, in favour of a small and simple approach. In particular, by using the terrible magic of ctypes, we could easily set up both client and server test cases that fully represent how the library works in real life.

So, instead of auto-generated bindings, we write - by hand - simple, thin, layers of type wrappers:

      class vfio_irq_info(Structure):
    _pack_ = 1
    _fields_ = [
        ("argsz", c.c_uint32),
        ("flags", c.c_uint32),
        ("index", c.c_uint32),
        ("count", c.c_uint32),
    ]

small harness routines for socket handling

      def connect_client(ctx):
    sock = connect_sock()

    json = b'{ "capabilities": { "max_msg_fds": 8 } }'
    # struct vfio_user_version
    payload = struct.pack("HH%dsc" % len(json), LIBVFIO_USER_MAJOR,
                          LIBVFIO_USER_MINOR, json, b'\0')
    hdr = vfio_user_header(VFIO_USER_VERSION, size=len(payload))
    sock.send(hdr + payload)
    ...

… interacting with the library on the server side

      def get_pci_header(ctx):
    ptr = lib.vfu_pci_get_config_space(ctx)
    return c.cast(ptr, c.POINTER(vfu_pci_hdr_t)).contents

… and so on. Writing this by hand might seem immensely tedious, but in practice, as it’s pretty much all boilerplate, it’s very quick to write and modify, and easily understandable; something that can rarely be said for any kind of auto-generated code.

Client/server interactions

Another observation was that, for the purposes of these tests, we really didn’t need a client process and a server process: in fact, we don’t even need more than one thread of execution. If we make each test round-robin between acting as the client, then acting as the server, it becomes trivial to follow the control flow, and understanding logs, debugging, etc. is much easier. This is illustrated by the msg() helper:

      def msg(ctx, sock, cmd, payload=bytearray(), expect_reply_errno=0, fds=None,
        rsp=True, expect_run_ctx_errno=None):
    """
    Round trip a request and reply to the server. vfu_run_ctx will be
    called once for the server to process the incoming message,
    @expect_run_ctx_errrno checks the return value of vfu_run_ctx. If a
    response is not expected then @rsp must be set to False, otherwise this
    function will block indefinitely.
    """
    # FIXME if expect_run_ctx_errno == errno.EBUSY then shouldn't it implied
    # that rsp == False?
    hdr = vfio_user_header(cmd, size=len(payload))

    if fds:
        sock.sendmsg([hdr + payload], [(socket.SOL_SOCKET, socket.SCM_RIGHTS,
                                        struct.pack("I" * len(fds), *fds))])
    else:
        sock.send(hdr + payload)

    ret = vfu_run_ctx(ctx, expect_errno=expect_run_ctx_errno)
    if expect_run_ctx_errno is None:
        assert ret >= 0, os.strerror(c.get_errno())

    if not rsp:
        return

    return get_reply(sock, expect=expect_reply_errno)

We are operating as the client when we do the sendmsg(); the server then processes that message via vfu_run_ctx(), before we “become” the client again and receive the response via get_reply().

We can then implement an individual test like this:

      def test_dma_region_too_big():
    global ctx, sock

    payload = vfio_user_dma_map(argsz=len(vfio_user_dma_map()),
        flags=(VFIO_USER_F_DMA_REGION_READ |
               VFIO_USER_F_DMA_REGION_WRITE),
        offset=0, addr=0x10000, size=MAX_DMA_SIZE + 4096)

    msg(ctx, sock, VFIO_USER_DMA_MAP, payload, expect_reply_errno=errno.ENOSPC)

which we can run via make pytest:

    ...
___________________________ test_dma_region_too_big ____________________________
----------------------------- Captured stdout call -----------------------------
DEBUG: quiescing device
DEBUG: device quiesced immediately
DEBUG: adding DMA region [0x10000, 0x80000011000) offset=0 flags=0x3
ERROR: DMA region size 8796093026304 > max 8796093022208
ERROR: failed to add DMA region [0x10000, 0x80000011000) offset=0 flags=0x3: No space left on device
ERROR: msg0x62: cmd 2 failed: No space left on device
...

This is many times easier to write and test than trying to do this in C, whether as a client/server, or attempting to use mocking. And we can be reasonably confident that the test is meaningful, as we are really executing all of the library’s message handling.

Debugging/testing tools

With a little bit of tweaking, we can also use standard C-based tools like valgrind and gcov. Code coverage is simple: after defeating the mini-boss of cmake, we can run make gcov and get code-coverage results for all C code invoked via the Python tests - it just works!

Running Python tests with valgrind was a little harder: for leak detection, we need to make sure the tests clean up after themselves explicitly. But Python itself also has a lot of valgrind noise. Eventually we found that this valgrind invocation worked well:

         PYTHONMALLOC=malloc \
        valgrind \
        --suppressions=$(CURDIR)/test/py/valgrind.supp \
        --quiet \
        --track-origins=yes \
        --errors-for-leak-kinds=definite \
        --show-leak-kinds=definite \
        --leak-check=full \
        --error-exitcode=1 \
        $(PYTESTCMD)

We need to force Python to use the system allocator, and add a number of suppressions for internal Python valgrind complaints - I was unable to find a working standard suppression file for Python, so had to construct this myself based upon the Python versions in our CI infrastructure.

Unfortunately, at least on our test systems, ASAN was completely incompatible, so we couldn’t directly run that for the Python tests.

Summary

The approach I’ve described here has worked really well for us: it no longer feels immensely tedious to add tests along with library changes, which can only help improve overall code quality. They are quick to run and modify, and for the most part easy to understand what the tests are actually doing.

There’s been a few occasions where ctypes has been difficult to work with - for me the documentation is particularly sparse, and callbacks from the C library into Python are distinctly non-obvious - but we’ve so far always managed to battle through, and twist it to our needs.

Doing things this way has a few other drawbacks: it’s not clear, for example, how we might test intermittent allocation failures, or other failure injection scenarios. It’s also not really suitable for any kind of performance or scalability testing.

I’m curious if others have taken a similar approach, and what their experiences might be.

Keeping Java alive on illumos The Trouble with Tribbles...

Back in 2019, a new JEP (JDK Enhancement Proposal) appeared.

JEP 362: Deprecate the Solaris and SPARC Ports

Of course, for those of us running Solaris or illumos (which is the same platform as far as Java is concerned), this was a big deal. Losing support for a major language on the platform was potentially a problem.

The stated reason for removal was:

Dropping support for these ports will enable contributors in the OpenJDK Community to accelerate the development of new features that will move the platform forward.

Clearly, this reflected a belief that maintaining Solaris and/or SPARC was a millstone dragging Java down. Still, it's their project, they can make whatever decisions they like, despite those of us who thought it was a bad move.

Eventually, despite objections, the ports were removed, towards the end of the JDK15 cycle.

At which point I simply carried on building OpenJDK. All I did was take the patch from the commit that removed Solaris support, applied that backwards, and added on top the pkgsrc patches that Jonathan Perkin had originally developed to support a gcc port on Solaris and illumos - patches we had already been using extensively from JDK11 onwards.

At that point I wasn't quite sure how sustainable this was. My aim was to support it as long as it wasn't proving too onerous or difficult, and my most optimistic hope was that we might be able to get to Java 17 which was planned as the next LTS release.

The modus operandi was really very simple. Every week a new tag is created. Download the tag, apply the patches, fix any errors in the patch set, try a build, hopefully fix any problems breaking the build.

Rinse and repeat, every week. The idea is that by doing it every week, it's a relatively small and manageable set of changes each time. Some weeks, it's just line number noise in the patches. Other weeks, it could be a more significant change. By loitering on the mailing lists, you become aware of what changes are coming up, which gives you a good idea of where to look when the build breaks.

Along the way, I've been cleaning up the patches to eliminate the SPARC code (you could put it back, but it's not a focus of this project) and most of the code to support the Studio toolchain (the version of Studio to build current Java isn't compatible with illumos anyway). So what we're left with is a straightforward Solaris/illumos+gcc port.

Most of the code changes I've needed to make are fairly straightforward procedural changes. Some functions moved namespace. Some function signatures have been changed. There's also been a lot of work to consolidate a number of functions into common posix code, rather than have each OS provide different implementations which might diverge and become hard to maintain.

Most of this was pretty simple. The only one that caused me a significant amount of work was the signal handling rewrite, which took several attempts to get to work at all.

And it's become fairly routine. Java 17 came along, eventually, and the builds were still succeeding and basic smoke-testing worked just fine. So, illumos has Java 17 available, just as I had hoped.

I originally packaged the builds on Tribblix, of course, which is where I'm doing the work. But I've also dropped tarballs of occasional builds so they can be downloaded and used on other illumos distributions.

Actually, the idea of those builds isn't so much that they're useful standalone, but they provide a bootstrap JDK that you can use to build Java yourself. Which, given that bootstrap JDK and my java patches, ought to be fairly straightforward. (There's a separate patch directory for each jdk release - the directory name ought to be obvious.)

Which means that if you want Java 17 on OmniOS, you can have just that - it's built and packaged ready for you. Not only that, Dominik fixed some problems with my signal handling fix so it works properly and without errors, which benefits everyone.

It doesn't stop there. In addition to the stream of quarterly updates (JDK 17 being an LTS release will see these for some time yet) work is continuing on mainline. JDK 18 works just fine, and as it's ramping down for release shouldn't have any breaking changes, so that's another release supported. I'm building JDK 19, although that's only about 1 build in so hasn't really had any significant changes put into it yet.

The fact that a relatively unskilled developer such as myself can maintain an out of tree Java port for a couple of years, tracking all the upstream changes, does make you wonder if supporting Solaris was really that much of a blocker to progress. At the time my belief was that it wasn't Solaris support that was the problem, but the Studio toolchain, and I think that's been borne out by my experience. Not only that, but the consolidation and simplification of the various OS-specific code into common posix code shows that supporting a variety of modern operating systems really isn't that hard.

OpenIndiana Hipster 2021.10 is here openindiana

Another 6 months have passed and we are proud to announce the release of our 2021.10 snapshot. The images are available at the usual place. As usual we have automatically received all updates that have been integrated into illumos-gate.

The new images are interesting for people with newer hardware that hasn’t been supported in the past. There is no necessity to re-install from newer images as OpenIndiana Hipster is a rolling release and will bring all updates with a simple call of “pfexec pkg update -v”.

More details will follow when time permits…