Adventures in Server Rooms The Trouble with Tribbles...

I'm a fan of the cloud. Honest. Providing someone else is paying for it, that is, given how ridiculously expensive it is.

But running stuff in the cloud is a whole lot easier for me. Someone else fixing hardware and infrastructure is a big win. No more driving to the datacenter at 3 in the morning. No more wandering into a room that's baking at 70C because the AC failed. No crawling about under the raised floor trying to trace that cable. No having a disk array in pieces because vendor A's SCSI adapter doesn't quite work right with vendor B's SCSI disk drive.

That said, some of the things that happened while I was fixing hardware do stand out.

Like the time I was putting together one of these stupid cable management arms. Feeling very satisfied with myself, I tried to step back only to discover that I had managed to velcro and cable tie myself into the damn thing. Unable to actually move the one arm, or get round it with the other, I eventually wriggled out of the stuck sweater and spent another 10 minutes rescuing my clothing from the clutches of the evil server.

Or, one morning, when I came in to find an SGI Challenge dead as a dodo. This was the under the desk model, back in the day when every machine had its own VT320 or similar perched on the desk above or alongside. I speak to SGI, and they assure me that everything's fine, I just need to replace the fuse. I dutifully do so and crawl under the desk to turn it back on, with my boss (the benefits of rank!) standing as far away as possible. One huge flash and enormous bang were followed by another bang as my head slammed into the underside of the desk. Some sharp words with the SGI support line followed.

Before we had datacenters, we had server rooms. Before that, we had cupboards. As a temporary measure, the campus router (oh, the days when you could find a Cisco AGS+ anywhere) was in a cupboard. The air conditioning unit extracted moisture from the air  into what was effectively a bucket on a scale, When the bucket was full, it was heavy enough to tip the scale, and powered off the AC unit - after all, you don't want puddles all over the floor. Depending on humidity, this wouldn't last overnight, let alone a weekend, resulting in extra trips across the county to empty the thing out of hours.

Servers are relatively benign beasts compared to the unreliable monstrosities responsible for datacenter power and cooling. In the early days at Hinxton, we had regular power cuts. The generator would decide to cut in, but there were several seconds before it had spun up to speed. It turned out that the thresholds on the generator for detecting low power had been set back in the days when mains voltage was 240V. Now, it's 220V, and the official range is technically the same, but when all the local farms brought the cows in for milking there was enough of a drop in voltage that the generator thought there was going to be loss of power and kicked in unnecessarily.

Once, when it cut over, the bus bars stuck halfway, fused, and the generator shed had a pool of molten copper on the floor. It was some hours before that had cooled far enough for the electricians to safely enter the building.

When we got a new building, it was much bigger, so the generators they put in were sized for critical equipment only. Along comes the first big mains outage, and whoever had assigned the "critical" circuits had done so essentially at random. Specifically, our servers (which had UPS protection to tide them through blips long enough for the generators to come on stream) clearly weren't getting any juice. There was a lot of yelling, and we had to shut half the stuff down to buy time, before they connected up the right circuits.

Someone higher up decided we needed fire protection. So that was put in, wired up, the idea is that it kills all power and cooling. There's also a manual switch, which is put by the door so you can hit the big red button as you're legging it out of there. Too close to the door, as it turns out, on the wall exactly where the door handle goes when you open the door.

They installed a status panel on the wall outside too. The electrician wiring this up never satisfactorily explained how he managed to cut all power, although he must have almost had a heart attack when a posse of system administrators came flying out of offices demanding to know what had happened.

Numeracy isn't known as a builder's strong point. Common sense likewise. So the datacenter has an outside door, then a ramp up to an inner door. We believed in having a decent amount of room under the raised floor - enough to crawl under if necessary. So they put in the doorframe, but hadn't allowed for the raised floor. When we went for the initial inspection we had to crawl through the door, it was that low.

Even the final version wasn't really high enough. The old Sun 42U racks are actually quite short - the power supply at the base eats into the dimensions, so the external height is just over 42U and there was really only 36U available for servers. When we ordered our first Dell 42U rack, we had to take all the servers out, half dismantle it, lay it down, and it took six of us to wangle it through the dogleg and the door (avoiding the aforementioned big red emergency cut off button!). After that, we went out and ordered custom 39U racks to be sure they would fit.

When you run a network connection to a campus, you use multiple routes - so we had one fibre running to Cambridge, another to London. Unfortunately, as it turned out, there's about 100 yards from the main building ingress to the main road, and someone cheated and laid the two fibres next to each other. The was some building work one day, and it had carefully been explained where the hole should and shouldn't be dug. Mid-afternoon, Sun Net Manager (remember that?) went totally mental. We go to the window, and there's a large hole, a JCB, and the obligatory gaggle of men in hard hats staring into the hole. My boss says "They're not supposed to be digging there!" and, indeed, they had managed to dig straight through the cables. If it had been a clean cut it wouldn't have been so bad (there's always a little slack so they can splice the fibres in the event of a break), but they had managed to pull the fibres out completely so they had to be blown back from London/Cambridge before we got back on the air.

Hello Oxide Tracing Kernel Functions: How the illumos AMD64 FBT Provider Intercepts Function Calls

My first day at Oxide Computer Company.

Retiring Guilt Josef "Jeff" Sipek

It took me about 3 years to write this post. Partly because I had other things I wanted to work on and partly because I hoped that it wouldn’t be needed. Well, I finally decided that I really need to write this.

In short, I’m officially stopping work on guilt.

Practically speaking, I haven’t touched it (as a developer) in over two years and as a user in about as long. So really, nothing will change.

What is guilt?

I started writing Guilt in fall 2006 because I was working on unionfs and needed to maintain patches on top of the Linux kernel git repository—much like what the mq extension did with Mercurial repositories.

It all started with:

commit 664e5a7d7f8d2c2726f03a239de11fa00127cf84
Author: Josef Sipek <jsipek@thor.fsl.cs.sunysb.edu>
Date:   Mon Nov 6 13:08:30 2006 -0500

    Initial commit

That’s right, 14 years to the day.

Technically, the first few versions were called “gq” (which stood for “git quilt”) until someone pointed out that “GQ” was a well established GTK-based LDAP client.

Artifacts

If anyone wishes to resurrect this project, then by all means go for it. If not, the old content will remain online for as long as I have a web server. :)

Specifically, you can find everything up to and including the last release (v0.37-rc1) at the following locations:

Users

I know that Guilt has served a number of people quite well over the years. It’s been quite stable and mostly feature complete since at least 2008, so I haven’t really been hearing from people short of the occasional patch or an occasional “oh yeah, I use that”.

To those users: I hope the last release works well enough for you until someone starts to maintain Guilt again or you find a different tool that suits your needs.

Mutt and Office365 Staring at the C

For reasons, I now need to interact with Office365 mail and calendar. It should go without saying that the Outlook webapp is almost painfully unusable (there really is no key binding for “next unread email”). Thus began the quest to get mutt interacting with the O365 server. This was a rather painful process: so much of the information on the web refers to earlier authentication schemes, Microsoft-special protocols, things that don’t support 2FA, dead Linux software, useless Linux software, etc.

After many false starts, I eventually found a working solution that allows mutt usage (and my treasured key binding for “mark current thread as read and move to the next unread email”). That solution is davmail. Yes, it’s on sourceforge, and yes, it’s Java, but it works perfectly.

It’s not very well-documented, but you can run it in headless mode and still do the interactive OAuth2 flow needed with modern O365. Your settings should include:

davmail.mode=O365Manual
davmail.url=https://outlook.office365.com/EWS/Exchange.asmx

When davmail starts, it will ask you to visit a URL and paste the resulting URL back - this contains the necessary OAuth2 tokens it needs. No need for any GUI!

Once davmail is running, your .fetchmailrc can be:

poll localhost protocol IMAP port 1143
 auth password username "username@company.com"
 is localuser here
 sslmode none
 keep
 mda "/usr/bin/procmail -d %T"
 folders INBOX,etc,etc

Note that since davmail is running locally, there’s not really any need for SSL, though you can set that up if you like.

When you start fetchmail, enter your password, and that will initiate the auth flow against the davmail instance. Note that you’re not storing passwords anywhere, unlike the old-style app password approach you might have used previously on gmail and the like.

I don’t need to send mail often, so I have mutt set up like this:

set smtp_url= "smtp://username@company.com@localhost:1025/"
unset smtp_pass
set ssl_starttls=no
set ssl_force_tls=no

Having to enter my password each time is not a big deal for me.

Equally I have my calendar app set up to pull over caldav from davmail. Works great. I’d love to be able to pull my O365 calendar into Google Calendar, but apparently Google and Microsoft are unable - or more likely unwilling - to make this work in any meaningful way.

I’m pretty sure it’s possible to adapt Google’s OAuth2 scripts to directly use fetchmail with O365’s modern auth stuff, but I’m not sure I have the energy to figure it out - and as far as I can find, nobody else has?

OmniOS Community Edition r151036 Stable OmniOS Community Edition

The OmniOS Community Edition Association is proud to announce the general availability of OmniOS - r151036.

OmniOS is published according to a 6-month release cycle; r151036 stable will be supported for a year. The old stable r151032 release is now end-of-life and will receive no further updates. See the release schedule for further details.

Release Notes and Upgrade Instructions

For full release notes including upgrade instructions; review the release notes and find upgrade instructions at omnios.org/upgrade

If you upgrade from r151030 and want to see all new features added since then, make sure to also read the release notes for the intervening r151032 and r151034 releases.

OmniOSce Newsletter

Since the start of OmniOS Community Edition project, we have predominantly announced our new releases via twitter. We are now also offering a newsletter with announcements of updates, bug fixes and new releases. You can subscribe here.

Commercial Support

Have you ever wondered how OmniOS development gets financed? You may have noticed that there is no big company bankrolling it all. The way we keep afloat is by the companies who rely on OmniOS powered servers taking out support contracts for their hardware. How about you? Visit omnios.org/support for more details and to generate a quote. If you aren’t in a position to take a support contract, please consider becoming an OmniOS patron to help secure its future - omnios.org/patron.

About OmniOS Community Edition Association - this Swiss Association is responsible for the ongoing development and maintenance of OmniOS, having been established in Summer 2017 after OmniTI announced their withdrawal from the project.

OmniOSce Association Aarweg 17, 4600 Olten, Switzerland

OpenIndiana Hipster 2020.10 is here openindiana

After almost 6 months of development we have released the new OpenIndiana Hipster snapshot 2020.10 right on time on October 31st, 2020. Some of the noticeable changes are:

  • After BHyVe has been upstreamed into illumos-gate we automatically got it. With the help of the OmniOSce people we were able to supply this with a bhyve branded zone for OpenIndiana.
  • FreeRDP and Remmina have been added.
  • The Squeak Virtual Machine (v4, v5 and v5 Cog) have been added and so it’s possible to run older and newer versions of Squeak Smalltalk and also Cuis Smalltalk on OpenIndiana.
  • Mate, our main desktop environment, has been updated to 1.24.
  • GCC 10 has been added.

A buch of other packages have also been updated. More information can be found in 2020.10 Release notes and new medias can be downloaded from http://dlc.openindiana.org.

Alas we weren’t able to update some important packages like Firefox and Thunderbird. These are high on our agenda but due to our limited developer and test capacities we weren’t able to care for them yet. We are planning for improvements for our infrastructure in order to lower the barriers for testing and developing OpenIndiana Hipster. So, if you are not yet involved but interested feel free to contact us either on our mailing lists or on IRC.

Tracing Kernel Functions: FBT stack() and arg Tracing Kernel Functions: How the illumos AMD64 FBT Provider Intercepts Function Calls

How the illumos AMD64 FBT provider implements the stack() action and built-in arg variables.

A Request to Security Researchers from illumos Kebe Says: Dan McD's blog

A Gentle Reminder About illumos

A very bad security vulnerability in Solaris was patched-and-announced by Oracle earlier this week. Turns out, we in open-source-descendant illumos had something in the same neighborhood. We can’t confirm it’s the same bug because reverse-engineering Oracle Solaris is off the table.

In general if a vulnerability is an old one in Solaris, there’s a good chance it’s also in illumos. Alex Wilson said it best in this recent tweet:

If you want to see the full history, the first 11 minutes of my talk from 2016’s FOSDEM contains WHY a sufficiently old vulnerability in Solaris 10 and even Solaris 11 may also be in illumos.

Remember folks, Solaris is closed-source under Oracle, even though it used to be open-source during the last years of Sun’s existence. illumos is open-source, related, but NOT the same as Solaris anymore. Another suggested talk covers this rather well, especially if you start at the right part.

The Actual Request

Because of this history and shared heritage, if you’re a security researcher, PLEASE make sure you find one of many illumos distributions, install it, and try your proof-of-concept on that as well. If you find the same vulnerability in illumos, please report it to us via the security@illumos.org mailing alias. We have a PGP key too!

Thank you, and please test your Solaris exploits on illumos too (and vice-versa).

Tribblix on Vultr with Block Storage The Trouble with Tribbles...

I wrote a while back about installing Tribblix on Vultr using iPXE.

That all still works perfectly with the m23.2 release. And it's pretty quick too. It's even quicker on the new "High Frequency" servers, that appear to be a 3.8GHz Skylake chip and NVME storage.

One of the snags with many of the smaller cloud providers is that they don't necessarily have a huge choice of instance configurations. I'm not saying I necessarily want the billions of choices that AWS provide, but the instances are fairly one dimensional - CPU, memory, and storage aren't adjustable independently. This is one of the reasons I use AWS, because I can add storage without having to pay for extra CPU or memory.

One option here is to take a small instance, and add extra storage to it. That's what I do on AWS, having a small root volume and adding a large EBS volume to give me the space I need. This isn't - yet - all that commonly available on other providers.

Vultr do have it as an option, but at the moment it's only available in New York (NJ). Let's deploy an instance, and select NY/NJ as the location.

Scroll down and choose the "Upload ISO" tab, select the iPXE radio button, and put in the ipxe.txt script location for the version of Tribblix you want to install.

I've chosen the 1GB instance size, which is ample. Deploy that, and then connect to the console.


View the console and watch the install. This is really quick if you deploy in London, and isn't too bad elsewhere in Europe, as the Tribblix server I'm loading from is in London. Transferring stuff across the Atlantic takes a bit longer.

Then run the install. This is just ./live_install.sh -G c2t0d0


It will download a number of packages to finish the install (these are normally loaded off the ISO if you boot that, but for an iPXE install it pulls them over the network).

Reboot the server and it will boot normally off disk.

Go to the Block Storage tab, and Add some block storage. I'm going to add 50GB just to play with.


Now we need to connect it to our server.

This isn't quite as obvious as I would have liked. Click on the little pencil icon and you get to the "Manage Block Storage" page. Select the instance you want to attach it to, and hit Attach. This will restart the server, so make sure you're not doing anything.

The documentation talks about /dev/vdb and the like, which isn't the way we name our devices. As we're using vioblk, this comes up as c3t0d0 (the initial boot drive is c2t0d0).


We can create a pool

zpool create -O compression=lz4 store c3t0d0

This took a little longer than I expected to complete.

And I can do all the normal things you would expect with a zfs pool.

If you go to Block Storage and click the pencil to Manage it, the size is clickable. I clicked that, and changed the size to 64GB.

Like resizing an EBS volume on AWS, there doesn't seem to be a way to persuade illumos to rescan the devices to spot that the size has changed. You have to reboot.

Only reboot doesn't appear to be enough. It says on the Vultr console "You will need to restart your server via the control panel before changes will be visible." and it appears to be correct on that.

(This is effectively power-cycling it, which is presumably necessary to propagate the change through the whole stack properly.)

After that, the additional space is visible, as you can see from the extra 14G in the EXPANDSZ column:

 

And you can expand the pool using 'online -e'

zpool online -e store c3t0d0

This caused me a little bit of trouble. This appeared to generate an I/O error, lots of messages, and a hung console. I had to ssh in, clear the pool, and run a scrub, before things looked sane. Expanding the pool then worked and things look OK.

Generally, block device resize appears to be possible, but is still a bit rough round the edges.

The state of Tribblix, 2020 The Trouble with Tribbles...

It's been a funny year, has 2020.

But amongst all this, work on Tribblix continues.

I released milestone 22 back in March. That was a fairly long time in the making, as the previous relase was 9 months earlier. Part of the reason for the lengthy delay was that there wasn't all that much reason for a new release - there are a lot of updated packages, but no big items. I guess the biggest thing is that the default gcc compiler and runtime went from gcc4 to gcc7. (In places, the gcc4 name continues.)

Milestone 23 was the next full release, in July. Things start to move again here - Tribblix fully transitioned from gcc4 to gcc7, as illumos is now a gcc7 build. I updated the MATE desktop, which was the start of moving from gtk2 to gtk3. There's a prettier boot banner, which allows a bit of custom branding.

There's a long-running effort to migrate from Python 2.x to 3.x. This is slow going - there are actually quite a lot of python modules and tools (and things that use python) that still show no sign of engaging with the Python 3 shift. But I'm gradually making sure that everything that can be version 3 is, and removing the python 2 pieces where possible. This is getting a bit more complicated - as of Python 3.8 I've switched from 32-bit to 64-bit. And now they're doing time-based releases there will be a version bump to navigate every year, just to add to the work.

Most of the Tribblix releases have been full upgrades from one version to the next. With the milestone 20 series, I had update releases, which allowed a shared stream of userland packages, while allowing illumos updates to take place. The same model is true of milestone 23 - update 1 came along in September.

With Milestone 23 update 1 we fixed the bhyve CVE. Other than normal updates, I added XView, which suits the retro theme and I've had quite a few people ask for.

Immediately after that (it was supposed to be in 23.1 but wasn't quite ready) came another major update: refreshing the X server stack.

When Tribblix was created, I didn't have the resources to build everything from scratch straight away, so "borrowed" a few components from OpenIndiana (initially 151a8, then 151a9) just to make sure I had enough bits to provide a complete OS. Many of the isolated components were replaced fairly quickly over time, but the X11 stack was the big holdout. It was finally time to build Xorg and the drivers myself. It wasn't too difficult, but to be honest I have no real way to test most of it. So that will all be present in 23.2.

One reason for doing this - and my hand was forced a little here - is that I've also updated Xfce from 4.12 to 4.14. That's also a gtk2 to gtk3 switch, but Xfce 4.14 simply didn't work on the old Xorg I had before.

Something else I've put together - and these are all gtk3 based - is a lightweight desktop, using abiword, geany, gnumeric, grisbi, imagination, and netsurf. You still need a file manager to round out the set, and I really haven't found anything that's lightweight and builds successfully, so at the moment this is really an adjunct to MATE or Xfce.

Alongside all this I've been working on keeping Java OpenJDK working on illumos. They ripped out Solaris support early in the year, but I've been able to put that back. The real killer here was Studio support, and we don't want that anyway (it's not open source, and the binaries no longer run). There are other unix-like variants supported by Java, running on the x86 architecture with a gcc toolchain, just like us, so it shouldn't be that much of a mountain to climb.

Support for SPARC is currently slightly on the back burner, partly because the big changes mentioned above aren't really relevant for SPARC, partly due to less time, partly due to the weather - running SPARC boxes in the home office tends to be more of a winter than a summer pursuit, due to the heat.


Rust after the honeymoon The Observation Deck

Two years ago, I had a blog entry describing falling in love with Rust. Of course, a relationship with a technology is like any other relationship: as novelty and infatuation wears off, it can get on a longer term (and often more realistic and subdued) footing — or it can begin to fray. So well one might ask: how is Rust after the honeymoon?

By way of answering that, I should note that about a year ago (and a year into my relationship with Rust) we started Oxide. On the one hand, the name was no accident — we saw Rust playing a large role in our future. But on the other, we hadn’t yet started to build in earnest, so it was really more pointed question than assertion: where might Rust fit in a stack that stretches from the bowels of firmware through a hypervisor and control plane and into the lofty heights of REST APIs?

The short answer from an Oxide perspective is that Rust has proven to be a really good fit — remarkably good, honestly — at more or less all layers of the stack. You can expect much, much more to come from Oxide on this (we intend to open source more or less everything we’re building), but for a teaser of the scope, you can see it in the work of Oxide engineers: see Cliff’s blog, Adam and Dave’s talk on Dropshot, Jess on using Dropshot within Oxide, Laura on Rust macros, and Steve Klabnik on why he joined Oxide. (Requisite aside: we’re hiring!)

So Rust is going really well for us at Oxide, but for the moment I want to focus on more personal things — reasons that I personally have enjoyed implementing in Rust. These run the gamut: some are tiny but beautiful details that allow me to indulge in the pleasure of the craft; some are much more profound features that represent important advances in the state of the art; and some are bodies of software developed by the Rust community, notable as much for their reflection of who is attracted to Rust (and why) as for the artifacts themselves. It should also be said that I stand by absolutely everything I said two years ago; this is not as a replacement for that list, but rather a supplement to it. Finally, this list is highly incomplete; there’s a lot to love about Rust and this shouldn’t be thought of as any way exhaustive!

1. no_std

When developing for embedded systems — and especially for the flotilla of microcontrollers that surround a host CPU on the kinds of servers we’re building at Oxide — memory use is critical. Historically, C has been the best fit for these applications just because it so lean: by providing essentially nothing other than the portable assembler that is the language itself, it avoids the implicit assumptions (and girth) of a complicated runtime. But the nothing that C provides reflects history more than minimalism; it is not an elegant nothing, but rather an ill-considered nothing that leaves those who build embedded systems building effectively everything themselves — and in a language that does little to help them write correct software.

Meanwhile, having been generally designed around modern machines with seemingly limitless resources, higher-level languages and environments are simply too full-featured to fit into (say) tens of kilobytes or into the (highly) constrained environment of a microcontroller. And even where one could cajole these other languages into the embedded use case, it has generally been as a reimplementation, leaving developers on a fork that isn’t necessarily benefiting from development in the underlying language.

Rust has taken a different approach: a rich, default standard library but also a first-class mechanism for programs to opt out of that standard library. By marking themselves as no_std, programs confine themselves to the functionality found in libcore. This functionality, in turn, makes no system assumptions — and in particular, performs no heap allocations. This is not easy for a system to do; it requires extraordinary discipline by those developing it (who must constantly differentiate between core functionality and standard functionality) and a broad empathy with the constraints of embedded software. Rust is blessed with both, and the upshot is remarkable: a safe, powerful language that can operate in the highly constrained environment of a microcontroller — with binaries every bit as small as those generated by C. This makes no_std — as Cliff has called it — the killer feature of embedded Rust, without real precedence or analogue.

2. {:#x?}

Two years ago, I mentioned that I love format!, and in particular the {:?} format specifier. What took me longer to discover was {:#?}, which formats a structure but also pretty-prints it (i.e., with newlines and indentation). This can be coupled with {:#x} to yield {:#x?} which pretty-prints a structure in hex. So this:


    println!("dumping {:#x?}", region);

Becomes this:


dumping Region {
    daddr: Some(
        0x4db8,
    ),
    base: 0x10000,
    size: 0x8000,
    attr: RegionAttr {
        read: true,
        write: false,
        execute: true,
        device: false,
        dma: false,
    },
    task: Task(
        0x0,
    ),
}

My fingers now type {:#x?} by default, and hot damn is it ever nice!

3. Integer literal syntax

Okay, another small one: I love the Rust integer literal syntax! In hardware-facing systems, we are often expressing things in terms of masks that ultimately map to binary. It is beyond me why C thought to introduce octal and hexadecimal but not binary in their literal syntax; Rust addresses this gap with the same “0b” prefix as found in some non-standard C compiler extensions. Additionally, Rust allows for integer literals to be arbitrarily intra-delimited with an underscore character. Taken together, this allows for a mask consisting of bits 8 through 10 and bit 12 (say) to be expressed as 0b0000_1011_1000_0000 — which to me is clearer as to its intent and less error prone than (say) 0xb80 or 0b101110000000.

And as long as we’re on the subject of integer literals: I also love that the types (and the suffix that denotes a literal’s type) explicitly encode bit width and signedness. Instead of dealing with the implicit signedness and width of char, short, long and long long, we have u8, u16, u32, u64, etc. Much clearer!

4. DWARF support

Debugging software — and more generally, the debuggability of software systems — is in my marrow; it may come as no surprise that one of the things that I personally have been working on is the debugger for a de novo Rust operating system that we’re developing. To be useful, debuggers need help from the compiler in the way of type information — but this information has been historically excruciating to extract, especially in production systems. (Or as Robert phrased it concisely years ago: “the compiler is the enemy.”) And while DWARF is the de facto standard, it is only as good as the compiler’s willingness to supply it.

Given how much debuggability can (sadly) lag development, I wasn’t really sure what I would find with respect to Rust, but I have been delighted to discover thorough DWARF support. This is especially important for Rust because it (rightfully) makes extensive use of inlining; without DWARF support to make sense of this inlining, it can be hard to make any sense of the generated assembly. I have been able to use the DWARF information to build some pretty powerful Rust-based tooling — with much promise on the horizon. (You can see an early study for this work in Tockilator.)

5. Gimli and Goblin

Lest I sound like I am heaping too much praise on DWARF, let me be clear that DWARF is historically acutely painful to deal with. The specification (to the degree that one can call it that) is an elaborate mess, and the format itself seems to go out of its way to inflict pain on those who would consume it. Fortunately, the Gimli crate that consumes DWARF is really good, having made it easy to build DWARF-based tooling. (I have found that whenever I am frustrated with Gimli, I am, in fact, frustrated with some strange pedantry of DWARF — which Gimli rightfully refuses to paper over.)

In addition to Gimli, I have also enjoyed using Goblin to consume ELF. ELF — in stark contrast to DWARF — is tight and crisp (and the traditional C-based tooling for ELF is quite good), but it was nice nonetheless that Goblin makes it so easy to zing through an ELF binary.

6. Data-bearing enums

Enums — that is, the “sum” class of algebraic types — are core to Rust, and give it the beautiful error handling that I described falling in love with two years ago. Algebraic types allow much more than just beautiful error handling, e.g. Rust’s ubiquitous Option type, which allows for sentinel values to be eliminated from one’s code — and with it some significant fraction of defects. But it’s one thing to use these constructs, and another to begin to develop algebraic types for one’s own code, and I have found the ability for enums to optionally bear data to be incredibly useful. In particular, when parsing a protocol, one is often taking a stream of bytes and turning it into one of several different kinds of things; it is really, really nice to have the type system guide how software should consume the protocol. For example, here’s an enum that I defined when parsing data from ARM’s Embedded Trace Macrocell signal protocol:


#[derive(Copy, Clone, Debug)]
pub enum ETM3Header {
    BranchAddress { addr: u8, c: bool },
    ASync,
    CycleCount,
    ISync,
    Trigger,
    OutOfOrder { tag: u8, size: u8 },
    StoreFailed,
    ISyncCycleCount,
    OutOfOrderPlaceholder { a: bool, tag: u8 },
    VMID,
    NormalData { a: bool, size: u8 },
    Timestamp { r: bool },
    DataSuppressed,
    Ignore,
    ValueNotTraced { a: bool },
    ContextID,
    ExceptionExit,
    ExceptionEntry,
    PHeaderFormat1 { e: u8, n: u8 },
    PHeaderFormat2 { e0: bool, e1: bool },
}

That variants can have wildly different types (and that some can bear data while others don’t — and some can be structured, while others are tuples) allows for the type definition to closely match the specification, and helps higher-level software consume the protocol correctly.

7. Ternary operations

In C, the ternary operator allows for a terse conditional expression that can be used as an rvalue, e.g.:


        x = is_foo ? foo : bar;

This is equivalent to:


        if (is_foo) {
                x = foo;
        } else {
                x = bar;
        }

This construct is particularly valuable when not actually assigning to an lvalue, but when (for example) returning a value or passing a parameter. And indeed, I would estimate that a plurality — if not a majority — of my lifetime-use of the ternary operator has been in arguments to printf.

While Rust has no ternary operator per se, it is expression-oriented: statements have values. So the above example becomes:


        x = if is_foo { foo } else { bar };

That’s a bit more verbose than its C equivalent (though I personally like its explicitness), but it really starts to shine when things can marginally more complicated: nested ternary operators get gnarly in C, but they are easy to follow as simple nested if-then-else statements in Rust. And (of course) match is an expression as well — and I found that I often use match where I would have used a ternary operator in C, with the added benefit that I am forced to deal with every case. As a concrete example, take this code that is printing a slice of little-endian bytes as an 8-bit, 16-bit, or 32-bit quantity depending on a size parameter:


    print!("{:0width$x} ",
        match size {
            1 => line[i - offs] as u32,
            2 => u16::from_le_bytes(slice.try_into().unwrap()) as u32,
            4 => u32::from_le_bytes(slice.try_into().unwrap()) as u32,
            _ => {
                panic!("invalid size");
            }
        },
        width = size * 2
    );

For me, this is all of the power of the ternary operator, but without its pitfalls!

An interesting footnote on this: Rust once had the C-like ternary operator, but removed it, as the additional syntax didn’t carry its weight. This pruning in Rust’s early days — the idea that syntax should carry its weight by bringing unique expressive power — has kept Rust from the fate of languages that suffered from debilitating addictions to new syntax and concomitant complexity overdose; when there is more than one way to do it for absolutely everything, a language becomes so baroque as to become write-only!

8. paste!

This is a small detail, but one that took me a little while to find. As I described in my blog entry two years ago, I have historically made heavy use of the C preprocessor. One (arcane) example of this is the ## token concatenation operator, which I have needed only rarely — but found essential in those moments. (Here’s a concrete example.) As part of a macro that I was developing, I found that I needed the equivalent for Rust, and was delighted to find David Tolnay’s paste crate. paste! was exactly what I needed — and more testament to both the singular power of Rust’s macro system and David’s knack for build singularly useful things with it!

9. unsafe

A great strength of Rust is its safety — but something I also appreciate about it is the escape hatch offered via unsafe, whereby certain actions are permitted that are otherwise disallowed. It should go without saying that one should not use unsafe without good reason — but such good reasons can and do exist, and I appreciate that Rust trusts the programmer enough to allow them to take their safety into their own hands. Speaking personally, most of my own uses of unsafe have boiled down to accesses to register blocks on a microcontroller: on the one hand, unsafe because they dereference arbitrary memory — but on the other, safe by inspection. That said, the one time I had to write unsafe code that actually felt dangerous (namely, in dealing with an outrageously unsafe C library), I was definitely in a heightened state of alert! Indeed, my extreme caution around unsafe code reflects how much Rust has changed my disposition: after nearly three decades working in C, I thought I appreciated its level of unsafety, but the truth is I had just become numb to it; to implement in Rust is to eat the fruit from the tree of knowledge of unsafe programs — and to go back to unsafe code is to realize that you were naked all along!

10. Multi-platform support

When Steve Klabnik joined Oxide, we got not only an important new addition to the team, but a new platform as well: Steve is using Windows as his daily driver, in part because of his own personal dedication to keeping Rust multi-platform. While I’m not sure that anything could drive me personally to use Windows (aside: MS-DOS robbed me of my childhood), I do strongly believe in platform heterogeneity. I love that Rust forces the programmer to really think about implicitly platform-specific issues: Rust refuses to paper over the cracks in computing’s foundation for sake of expediency. If this can feel unnecessarily pedantic (can’t I just have a timestamp?!), it is in multi-platform support where this shines: software that I wrote just… worked on Windows. (And where it didn’t, it was despite Rust’s best efforts: when a standard library gives you first-class support to abstract the path separator, you have no one to blame but yourself if you hard-code your own!)

Making and keeping Rust multi-platform is hard work for everyone involved; but as someone who is currently writing Rust for multiple operating systems (Linux, illumos and — thanks to Steve — Windows) and multiple ISAs (e.g., x86-64, ARM Thumb-2), I very much appreciate that this is valued by the Rust community!

11. anyhow! + RUST_BACKTRACE

In my original piece, I praised the error handling of Rust, and that is certainly truer than ever: I simply cannot imagine going back to a world without algebraic types for error handling. The challenge that remained was that there were several conflicting crates building different error types and supporting routines, resulting in some confusion as to best practice. All of this left me — like many — simply rolling my own via Box<dyn Error>, which works well enough, but it doesn’t really help a thorny question: when an error emerges deep within a stack of composed software, where did it actually come from?

Enter David Tolnay (again!) and his handy anyhow! crate, which pulls together best practices and ties that into the improvements in the std::error::Error trait to yield a crate that is powerful without being imposing. Now, when an error emerges from within a stack of software, we can get a crisp chain of causality, e.g.:


readmem failed: A core architecture specific error occurred

Caused by:
    0: Failed to read register CSW at address 0x00000000
    1: Didn't receive any answer during batch processing: [Read(AccessPort(0), 0)]

And we can set RUST_BACKTRACE to get a full backtrace where an error actually originates — which is especially useful when a failure emerges from a surprising place, like this one from a Drop implementation in probe-rs:


Stack backtrace:
   0: probe_rs::probe::daplink::DAPLink::process_batch
   1: probe_rs::probe::daplink::DAPLink::batch_add
   2: ::read_register
   3: probe_rs::architecture::arm::communication_interface::ArmCommunicationInterface::read_ap_register
   4: probe_rs::architecture::arm::memory::adi_v5_memory_interface::ADIMemoryInterface::read_word_32
   5: <probe_rs::architecture::arm::memory::adi_v5_memory_interface::ADIMemoryInterface as probe_rs::memory::MemoryInterface>::read_word_32
   6: ::get_available_breakpoint_units
   7: <core::iter::adapters::ResultShunt<I> as core::iter::traits::iterator::Iterator>::next
   8: <alloc::vec::Vec as alloc::vec::SpecFromIter>::from_iter
   9: ::drop
  10: core::ptr::drop_in_place
  11: main
  12: std::sys_common::backtrace::__rust_begin_short_backtrace
  13: std::rt::lang_start::{{closure}}
  14: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
  15: main
  16: __libc_start_main
  17: _start) })

12. asm!

When writing software at the hardware/software interface, there is inevitably some degree of direct machine interaction that must be done via assembly. Historically, I have done this via dedicated .s files — which are inconvenient, but explicit.

Over the years, compilers added the capacity to drop assembly into C, but the verb here is apt: the resulting assembly was often dropped on its surrounding C like a Looney Tunes anvil, with the interface between the two often being ill-defined, compiler-dependent or both. Rust took this approach at first too, but it suffered from all of the historical problems of inline assembly — which in Rust’s case meant being highly dependent on LLVM implementation details. This in turn meant that it was unlikely to ever become stabilized, which would relegate those who need inline assembly to forever be on nightly Rust.

Fortunately, Amanieu d’Antras took on this gritty problem, and landed a new asm! syntax. The new syntax is a pleasure to work with, and frankly Rust has now leapfrogged C in terms of ease and robustness of integrating inline assembly!

13. String continuations

Okay, this is another tiny one, but meaningful for me and one that took me too long to discover. So first, something to know about me: I am an eighty column purist. For me, this has nothing to do with punchcards or whatnot, but rather with type readability, which tends to result in 50-100 characters per line — and generally about 70 or so. (I would redirect rebuttals to your bookshelf, where most any line of most any page of most any book will show this to be more or less correct.) So I personally embrace the “hard 80″, and have found that the rework that that can sometimes require results in more readable, better factored code. There is, however, one annoying exception to this: when programmatically printing a string that is itself long, one is left with much less horizontal real estate to work with! In C, this is a snap: string literals without intervening tokens are automatically concatenated, so the single literal can be made by multiple literals across multiple lines. But in Rust, string literals can span multiple lines (generally a feature!), so splitting the line will also embed the newline and any leading whitespace. e.g.:


    println!(
        "...government of the {p}, by the {p}, for the {p},
        shall not perish from the earth.",
        p = "people"
    );

Results in a newline and some leading whitespace that represent the structure of the program, not the desired structure of the string:


...government of the people, by the people, for the people,
        shall not perish from the earth.

I have historically worked around this by using the concat! macro to concatenate two (or more) static strings, which works well enough, but looks pretty clunky, e.g.:


    println!(
        concat!(
            "...government of the {p}, by the {p}, for the {p}, ",
            "shall not perish from the earth."
        ),
        p = "people"
    );

As it turns out, I was really overthinking it, though it took an embarrassingly long time to discover: Rust has support for continuation of string literals! If a line containing a string literal ends in a backslash, the literal continues on the next line, with the newline and any leading whitespace elided. This is one of those really nice things that Rust lets us have; the above example becomes:


    println!(
        "...government of the {p}, by the {p}, for the {p}, \
        shall not perish from the earth.",
        p = "people"
    );

So much cleaner!

14. --pretty=expanded and cargo expand

In C — especially C that makes heavy use of the preprocessor — the -E option can be invaluable: it stops the compilation after the preprocessing phase and dumps the result to standard output. Rust, as it turns out has an equivalent in the --pretty=expanded unstable compiler option. The output out of this can be a little hard on the eyes, so you want to send it through rustfmt — but the result can be really enlightening as to how things actually work. Take, for example, the following program:


fn main() {
    println!("{} has been quite a year!", 2020);
}

Here is the --pretty=expanded output:


$ rustc -Z unstable-options --pretty=expanded year.rs | rustfmt --emit stdout
#![feature(prelude_import)]
#![no_std]
#[prelude_import]
use std::prelude::v1::*;
#[macro_use]
extern crate std;
fn main() {
    {
        ::std::io::_print(::core::fmt::Arguments::new_v1(
            &["", " has been quite a year!\n"],
            &match (&2020,) {
                (arg0,) => [::core::fmt::ArgumentV1::new(
                    arg0,
                    ::core::fmt::Display::fmt,
                )],
            },
        ));
    };
}

As an aside, format_args! is really magical — and a subject that really merits its own blog post from someone with more expertise on the subject. (Yes, this is the Rust blogging equivalent of Chekhov’s gun!)

With so many great David Tolnay crates, it’s fitting we end on one final piece of software from him: cargo expand is a pleasant wrapper around --pretty=expanded that (among other things) allows you to only dump a particular function.

The perfect marriage?

All of this is not to say that Rust is perfect; there are certainly some minor annoyances (rustfmt: looking at you!), and some forthcoming features that I eagerly await (e.g., safe transmutes, const generics). And in case it needs to be said: just because Rust makes it easier to write robust software doesn’t mean that it makes it impossible to write shoddy software!

Dwelling on the imperfections, though, would be a mistake. When getting into a long-term relationship with anything — be it a person, or a company, or a technology — it can be tempting to look at its surface characteristics: does this person, or company or technology have attributes that I do or don’t like? And those are important, but they can be overemphasized: because things change over time, we sometimes look too much at what things are rather than what guides them. And in this regard, my relationship with Rust feels particularly sound: it feels like my values and Rust’s values are a good fit for one another — and that my growing relationship with Rust will be one of the most important of my career!

Email vs. Tool du Jour Josef "Jeff" Sipek

TL;DR: Just because email is decades old doesn’t mean that it cannot serve a vital role in modern project management, research, development, and support.

Ultimately, working on a project requires communication—and lots of it. Communication with peers, with managers, with other departments within the company, and even with customers. It is tempting to grab the Tool du Jour and add it to your ever-growing arsenal of tools believing it will make communication easier. Often, it does not.

For example, let’s consider these tools: Jira, Confluence, Slack, Zoom, GitHub/GitLab, phone, and email.

Does your company use these tools or their equivalents? Isn’t it a bit overkill to have 7 different channels of communication? Sure, often one tool is better at a particular mode of communication than the others but there is a significant overlap.

Do you want a video chat? Do you use Slack or Zoom?

Voice chat? Slack, Zoom, or phone?

Do you want to ask a question related to a bug? Do you use Jira, Slack, or just set up a call? Voice or video? Or would email be best?

Do you keep track of your project via high-level Jira issues? Or do you use a set of Confluence pages where you include various semi-autogenerated plots?

Wikipedia article: Decision fatigue is real. Do you want your (rather expensive) employees to waste their cognitive capacity deciding which tool to use? Or do you want them to make the product better?

It is painful how many times over the past ten years I’ve witnessed conversations that went much like this:

A: Can you answer the question I left in Jira?
B: <B reads Jira question> Oh, that is answered on the ABC123 Confluence page.
A: Ah. Can you make a note of that in Jira? Thanks.

This example involves three communication channels—Jira, Confluence, and some chat system.

This sort of communication fragmentation is really bad. Not only does it waste a lot of time with exchanges like this example, it also makes searching for information essentially impossible. Who in their right mind would search half a dozen tools (with various degrees of search capability) for something? It is simply easier to just ask your coworkers. After all, their time is less valuable to you than your own time and sanity.

So, what can be done to improve things?

Well, if at all possible do not use tools that have duplicate functionality. If you have to, hopefully you can disable the duplicate functionality. If there is no way to disable it, then you must make it painfully clear where such communication should go. Hopefully this can be done via automated hooks that somehow notify the user. For example, automatically closing issues opened in the wrong bug tracker (e.g., opened in GitHub instead of Jira), or automatically responding to wiki commenters directing them to the proper medium for wiki discussion. Finally, if all else fails, have someone (ideally manager or team lead so the notification has some weight to it) manually make sure that anyone that uses the functionality is told not to.

This reduction in the number of tools should also help with responsiveness. It is no secret that Wikipedia article:  the average human can hold only about 7 things in working memory at the same time. How many of those do you want to dedicate to tooling? If I have to remember to check 7 different tools periodically, one of two things happens: either I manage to check them all but accomplish nothing else, or I get things done but only remember about 2 or 3 tools.

That should help with quite a bit of the fragmentation. Now “all” that’s left to do is decide which communication channel is used for what.

I have concluded that there are four major levels of communication:

  1. important, synchronous
  2. important, asynchronous
  3. unimportant, synchronous
  4. unimportant, asynchronous

I’m using the terms “synchronous” to mean that you want the back-and-forth latency to be low, and “important” to mean that that you must have an answer. Note that “unimportant” does not mean off-topic, but rather lower priority.

Why make the synchronous/asynchronous distinction? For multiple reasons. First of all, being interrupted in the middle of something is costly. It takes a significant amount of time to get back “into the zone” but only a fraction of a minute to get out of it. Would you rather pay your employees to try to work or to actually work? And second, asynchrony makes communication across time zones easier. Not easy, but easier.

So, let me go through the four major levels of communication one by one and share my opinion about what works and why.

important, synchronous
If you want to have a (relatively) rapid back-and-forth, you pretty much have to use an in-person meeting or a voice/video call. A one-to-one (i.e., non-group) chat can also possibly work, but there will be temptation to multi-task. This desire to multi-task implies that the chat isn’t actually that important.
important, asynchronous
When you don’t require having the answers immediately or when it simply isn’t possible to get everyone in the same “room” at the same time for a meeting (in person, voice, or video), email is probably the best communication method. Each person can read it and possibly reply at a the most convenient time for them.
unimportant, synchronous
This is the form of communication that includes various chit-chat, sanity checking polls (e.g., “would anyone object if I tried xyz?”), and so on. It lets you quickly get bits of information, but in a way it is unreliable. Not everyone is reading the chat when you say something and when it scrolls off the screen it is as if you never said it. In other words, do not expect anyone to read the group chat messages from when they were away. If you want someone specific or even everyone to see a particular message, it is not an unimportant message. One-to-one chat is a little different since it is more “reliable”, but usually anything substantial that is important will end up with a call instead.
unimportant, asynchronous
Finally, all the things you’d like others to see at some point in the future should be sent as an email. The recipients will read it when they get to it, and since it isn’t important it probably doesn’t even require a reply.

These four levels are, of course, not set in stone. It is possible (and I’d even encourage it) to upgrade or downgrade your communication as needed. For example, it is perfectly reasonable to ask in chat if there are objections or obvious issues with a particular approach, function, or workload. Then, if the responses don’t make it obviously a terrible idea but a more definitive discussion is desired, a similar (but more detailed) version can be sent via email. In essence, upgrading it from “unimportant synchronous” to “important asynchronous”. (Caution: don’t overdo these upgrades/downgrades.)

As you can see, I think that email is a good choice for any asynchronous communication. That’s for good reasons. Everyone has an email address, everyone knows how to use it (at least a little), and the free-form format allows you to use the most appropriate content type to get your point across—be it ASCII art, images, or even Excel spreadsheets. In other words:

Email is ubiquitous.

Email works remarkably well.

Email is extremely flexible.

As a real world example, consider that pretty much every company-wide announcement (important or not) has been made either in a huge meeting or via email. Often the meeting-time announcements are followed up by an email anyway! It’s not a chat message. It’s not a Confluence page. It’s not a Jira issue. It’s an email.

Before I conclude, I’d like to address two slightly more specific cases.

First, what about issue tracking? How does that tie into my email-centric world? Well, you can keep your issue tracker, but in my opinion, the comments feature should not be used. If a ticket needs to be discussed, send an email, set up a conference call, whatever works for your—just don’t use the comments on the issue. If you look at any issue in your issue tracker, the comments will fall into one of two categories. Either there are none or there are many, and it is painfully clear that people don’t read them and ask the same questions over and over. Instead of burying progress reports or updates to the understanding of the issue in a comment that nobody will ever read, that information should be used to reword the issue description. The same largely applies to other tools’ comments sections as well.

Second is a concern that people will not read all those emails. I think this is only a problem if there are too many tools and email isn’t viewed as an important one. If (unrealistically) all communication happens through email, then right after communicating with someone, the person is already in the right tool to handle the next communication. If code reviews, support requests, and everything else were to go to the same place, there is nearly zero context switching cost. Even if the person goes to use a different tool (e.g., a text editor or an IDE), when that work is done, they’ll return to their email client. In other words, if you make the email communication channel important, your emails will get read. If you don’t make it important, then you (individually) are better off using a channel the recipient considers important. In an environment with too many tools, each recipient may have a different preference.

Just to make it painfully clear, I am not advocating killing off everything except email. Instead, I’m advocating killing off tools that duplicate functionality, and shifting all asynchronous communication to a single medium. In my experience, the most efficient (and least disruptive) asynchronous communication medium is email. And therefore it should not only be one of the tools that survives the culling, but it also should be the one that is embraced afterwards.

That’s it for today. In the next post, I’ll talk about what I consider the ideal code review workflow.

OmniOS Community Edition r151030bw, r151032aw, r151034w OmniOS Community Edition

OmniOS weekly releases for w/c 5th of October 2020 are now available.

  • For all supported OmniOS releases, pkg has been enhanced to allow for more detailed diagnostic messages via -vv in the event that an upgrade solution can’t be found.

With IPS, it’s possible to get a situation where an upgrade cannot be performed and the error message is not very informative, for example:

r151034% pfexec pkg update -nv
Creating Plan (Running solver): |
pkg update: No solution was found to satisfy constraints
No solution found to update to latest available versions.
This may indicate an overly constrained set of packages are installed.

latest incorporations:

  pkg://omnios/consolidation/osnet/osnet-incorporation@0.5.11,5.11-151036.0:20201002T085228Z
  pkg://omnios/developer/illumos-tools@11,5.11-151036.0:20201002T071229Z
  pkg://omnios/developer/omnios-build-tools@11,5.11-151036.0:20201002T071709Z
  pkg://omnios/entire@11,5.11-151036.0:20201002T072426Z
  pkg://omnios/incorporation/jeos/illumos-gate@11,5.11-151036.0:20201002T085700Z
  pkg://omnios/incorporation/jeos/omnios-userland@11,5.11-151036.0:20201002T072614Z

Dependency analysis is unable to determine the cause.
Try specifying expected versions to obtain more detailed error messages.

This change adds a new behaviour when -v is provided a second time, which shows a better error message that should point directly to the problem

r151034% pfexec pkg update -nvv
Retrieving package list...
Retrieving list of packages to update...
Creating Plan (Solver setup): /
pkg update: Package 'ooce/extra-build-tools' must be uninstalled or upgraded if the requested operation is to be performed.
  Reject:  pkg://extra.omnios/ooce/extra-build-tools@11-151034.0
  Reason:  No version for 'conditional' dependency on ooce/x11/header/xcb-protocols can be found
Package 'ooce/omnios-build-tools' must be uninstalled or upgraded if the requested operation is to be performed.
  Reject:  pkg://extra.omnios/ooce/omnios-build-tools@11-151034.0
  Reason:  No version for 'conditional' dependency on ooce/x11/header/x11-protocols can be found

For further details, please see https://omniosce.org/releasenotes


Any problems or questions, please get in touch.

Using the 'zadm' utility to create an sparse zone from a template OmniOS Community Edition

This week’s ascii-cast shows how to use zadm to create a sparse-branded zone in OmniOS r151034 using a prepared JSON file, along with a memory cap and CPU shares.


zadm is open source and hosted on Github. Feedback and pull requests are welcome.

Any questions, please get in touch!