Disabling Monospaced Font Ligatures Josef "Jeff" Sipek

A recent upgrade of FreeBSD on my desktop resulted in just about every program (Firefox, KiCAD, but thankfully not urxvt) rendering various ligatures even for monospaced fonts. Needless to say, this is really annoying when looking at code, etc. Not having any better ideas, I asked on Mastodon if anyone knew how to turn this mis-feature off.

About an hour later, @monwarez@bsd.cafe suggested dropping the following XML in /usr/local/etc/fonts/conf.avail/29-local-noto-mono-fixup.conf and adding a symlink in ../conf.d to enable it:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "urn:fontconfig:fonts.dtd">
<fontconfig>
        <description>Disable ligatures for monospaced fonts to avoid ff, fi, ffi, etc. becoming only one character wide</description>
        <match target="font">
                <test name="family" compare="eq">
                        <string>Noto Sans Mono</string>
                </test>
                <edit name="fontfeatures" mode="append">
                        <string>liga off</string>
                        <string>dlig off</string>
                </edit>
        </match>
</fontconfig>

This solved my problem. Hopefully this will help others. if not, it’s a note-to-self for when I need to reapply this fixup :)

Moore's Scofflaws Oxide Computer Company Blog

Years ago, Jeff Bezos famously quipped that "your margin is my opportunity." This was of course aimed not at Amazon’s customers, but rather its competitors, and it was deadly serious: customers of AWS in those bygone years will fondly remember that every re:Invent brought with it another round of price cuts. This era did not merely reflect Bezos’s relentless execution, but also a disposition towards who should reap the reward of advances in underlying technology: Amazon believed (if implicitly) that improvements at the foundations of computing (e.g., in transistor density, core count, DRAM density, storage density, etc.) should reflect themselves in lower prices for consumers rather than higher margins for suppliers.

Price cuts are no longer a re:Invent staple, having been replaced by a regular Amazon tradition of a different flavor: cutting depreciation (and therefore increasing earnings) by extending the effective life of their servers. (These announcements are understandably much more subdued, as "my depreciation is my margin opportunity" doesn’t have quite the same ring to it.)

As compute needs have grown and price cuts have become an increasingly distant memory, some have questioned their sky-high cloud bills, wondering if they should in fact be owning their compute instead of renting it. When we started Oxide, we knew from operating our own public cloud what those economics looked like — and we knew that over time others of a particular scale would come to the same realization that they would be better off not giving their margin away by renting compute. (Though it’s safe to say that we did not predict that it would be DHH leading the charge!)

Owning one’s own cloud sounds great, but there is a bit that’s unsaid: what about the software? Software is essential for elastic, automated infrastructure: hardware alone does not a cloud make! Unfortunately, the traditional server vendors do not help here: because of a PC-era divide in how systems are delivered, customers are told to look elsewhere for any and all system software. This divide is problematic on several levels. First, it impedes the hardware/software co-design that we (and, famously, others!) believe is essential to deliver the best possible product. Second, it leads to infamous finger pointing when the whole thing doesn’t work. But there is also a thorny economic problem: when your hardware and your software don’t come from the same provider, to whom should go the spoils of better hardware?

To someone who has just decided to buy their hardware out of their frustration with renting it, the answer feels obvious: whoever owns the hardware should naturally benefit from its advances! Unfortunately, the enterprise software vendor delivering your infrastructure often has other ideas — and because their software is neither rented nor bought, but rather comes from the hinterlands of software licensing, they have broad latitude as to how it is priced and used. In particular, this allows them to charge based on the hardware that you run it on — to have per-core software licensing.

This galling practice isn’t new (and is in fact as old as symmetric multiprocessing systems), but it has taken on new dimensions in the era of chiplets and packaging innovation: the advances that your next CPU has over your current one are very likely to be expressed in core count. Per-core licensing allows a third party — who neither made the significant investment in developing the next generation of microprocessor nor paid for the part themselves — to exact a tax on improved infrastructure. (And this tax can be shockingly brazen!) Couple this with the elimination of perpetual licensing, and software costs can potentially absorb the entire gain from a next-generation CPU, leaving a disincentive to run newer, more efficient infrastructure. As an industry, we have come to accept this practice, but we shouldn’t: in the go-go era of Dennard scaling (when clock rates rose at a blistering rate), software vendors never would have been allowed to get away with charging by the gigahertz; we should not allow them to feel so emboldened to charge by core count now!

If it needs to be said, we have taken a different approach at Oxide: when you buy the Oxide cloud computer, all of the software to run it is included. This includes all of the software necessary to run the rack as elastic infrastructure: virtual compute, virtual storage, virtual networking. (And yes, it’s all open source — which unfortunately demands the immediate clarification that it’s actually open source rather than pretend open source.) When we add a new feature to our software, there is no licensing enablement or other such nuisance — the feature just comes with the next update. And what happens when AMD releases a new CPU with twice the core count? The new sled running the new CPU runs along your existing rack — you’re not paying more than the cost of the new sled itself. This gives the dividends of Moore’s Law (or Wright’s Law!) to whom they rightfully belong: the users of compute.

The SunOS JDK builder The Trouble with Tribbles...

I've been building OpenJDK on Solaris and illumos for a while.

This has been moderately successful; illumos distributions now have access to up to date LTS releases, most of which work well. (At least 11 and 17 are fine; 21 isn't quite right.)

There are even some third-party collections of my patches, primarily for Solaris (as opposed to illumos) builds.

I've added another tool. The SunOS jdk builder.

The aim here is to be able to build every single jdk tag, rather than going to one of the existing repos which only have the current builds. And, yes, you could grope through the git history to get to older builds, but one problem with that is that you can't actually fix problems with past builds.

Most of the content is in the jdk-sunos-patches repository. Here there are patches for both illumos and Solaris (they're ever so slightly different) for every tag I've built.

(That's almost every jdk tag since the Solaris/SPARC/Studio removal, and a few before that. Every so often I find I missed one. And there's been the odd bad patch along the way.)

The idea here is to make it easy to build every tag, and to do so on a current system. I've had to add new patches to get some of the older builds to work. The world has changed, we have newer compilers and other tools, and the OS we're building on has evolved. So if someone wanted to start building the jdk from scratch (and remember that you have to build all the versions in sequence) then this would be useful.

I'm using it for a couple of other things.

One is to put back SPARC support on illumos and Solaris. The initial port I did was on x86 only, so I'm walking through older builds and getting them to work on SPARC. We'll almost certainly not get to jdk21, but 17 seems a reasonable target.

The other thing is to enable the test suites, and then run them, and hopefully get them clean. At the moment they aren't, but a lot of that is because many tests are OS-specific and they don't know what Solaris is so get confused. With all the tags, I can bisect on failures and (hopefully) fix them.

What punch cards teach us about AI risk The Observation Deck

I (finally) read Edwin Black’s IBM and the Holocaust, and I can’t recommend it strongly enough. This book had been on my queue for years, and I put it off for the same reason that you have probably put it off: we don’t like to confront difficult things. But the book is superlative: not only is it fascinating and well-researched but given the current level of anxiety about the consequences of technological development, it feels especially timely. Black makes clear in his preface that IBM did not cause the Holocaust (unequivocally, the Holocaust would have happened without IBM), but he also makes clear in the book that information management was essential to every aspect of the Nazi war machine — and that that information management was made possible through IBM equipment and (especially) their punch cards.

I have known little of computing before the stored program computer, and two aspects of punch card systems of this era were surprising to me: first, to assure correct operation in these most mechanical of systems, the punch cards themselves must be very precisely composed, manufactured, and handled — and the manufacturing process itself is difficult to replicate. Second, punch cards of this era were essentially single-use items: once a punch card had been through a calculation, it had to be scrapped. Given that IBM was the only creator of punch cards for its machines, this may sound like an early example of the razor blade model, but it is in fact even more lucrative: IBM didn’t sell the machines at a discount because they didn’t sell the machines at all — they rented them. This was an outrageously profitable business model, and a reflection of the most dominant trait of its CEO, Thomas J. Watson: devotion to profit over all else.

In the Nazis, Watson saw a business partner to advance that profit — and they saw in him an American advocate for appeasement, with Hitler awarding Watson its highest civilian medal in 1937. (In this regard, the Nazis themselves didn’t understand that Watson cared only about profit: unlike other American Nazi sympathizers, Watson would support an American war effort if he saw profit in it — and he publicly returned the medal after the invasion of Holland in 1940, when public support of the Nazis had become a clear commercial liability.) A particularly revealing moment with respect to Watson’s disposition was in September 1939 (after the invasion of Poland!) when IBM’s German subsidiary (known at the time as Dehomag) made the case to him that the IBM 405 alphabetizers owned by IBM’s Austrian entity in the annexed Austria now belonged to the German entity to lease as they please. These particular alphabetizers were important: the 405 was an order of magnitude improvement over the IBM 601 — and it was not broadly found in Europe. Watson resisted handing over the Austrian 405s, though not over any point of principle, but rather of avarice: in exchange for the 405s, he demanded (as he had throughout the late 1930s) that he have complete ownership of IBM’s German subsidiary rather than the mere 90% that IBM controlled. The German subsidiary refused the demand and ultimately Watson relented — and the machines effectively became enlisted as German weapons of war.

IBM has made the case that it did not know how its machines were used to effect the Holocaust, but this is hard to believe given Watson’s level of micromanagement of the German subsidiary through Switzerland during the war: IBM knew which machines were where (and knew, for example, that concentration camps all had ample sorters and tabulators), to the point that the company was able to retrieve them all after the war — along with the profits that the machines had earned.

This all has much to teach us about the present day with respect to the true risks of technology. Technology serves as a force-multiplier on humanity, for both better and ill. The most horrific human act — genocide — requires organization and communication, two problems for which we have long developed technological solutions. Whether it was punch cards and tabulators in the Holocaust, radio transmission in the Rwandan Genocide, or Facebook in the Rohingya genocide, technology has sadly been used as an essential tool for our absolute worst. It may be tempting to blame the technology itself, but that in fact absolves the humans at the helm. Should we have stymied the development of tabulators and sorters in the 1920s and 1930s? No, of course not. And nor, for that matter, should Rwanda have been deprived of radio or Myanmar of social media. But this is not to say that we should ignore technology’s role, either: the UN erred in not destroying the radio transmission capabilities in Rwanda; Facebook erred by willfully ignoring the growing anti-Rohingya violence; and IBM emphatically erred by being willing to supply the Nazis in the name of its own profits.

To bring this into the present day: as I relayed in my recent Monktoberfest talk, the fears of AI autonomously destroying humanity are worse than nonsense, because they distract us from the very real possibilities of how AI may be abused. To allow ourselves to even contemplate a prohibition of the development of certain kinds of computer programs is to delude ourselves into thinking that the problem is a technical problem rather than a human one. Worse, the very absurdity of prohibition has itself created a reactionary movement in the so-called “effective accelerationists” who, like some AI equivalent of rolling coal, refuse to contemplate any negative ramifications of technological development whatsoever. This, too, is grievously wrong, and we need look no further than IBM’s involvement in the Holocaust to see the peril of absolute adherence to technology-based profit.

So what course to chart with respect to the (real, human) risks of AI? We should consider another important fact of IBM’s involvement with the Nazis: IBM itself skirted the law. Some the most interesting findings in Black’s book are from the US Department of Treasury’s 1943 investigation into IBM’s collusion with Hitler. The investigator — Harold Carter — had plenty of evidence that IBM was violating the Trading with the Enemy Act, but Watson had also so thoroughly supported the Allied war effort that he was unassailable within the US. We already have regulatory regimes with respect to safety: you can’t just obtain fissile material or make a bioweapon — it doesn’t matter if ChatGPT told you to do it or not. We should be unafraid to enforce existing laws. Believing that (say) Uber was wrong to illegally put their self-driving cars on the street does not make one a “decel” or whatever — it makes one a believer in the rule of law in a democratic society. That this sounds radical — that one might believe in a democracy that creates laws, affords companies economic freedom within those laws, and enforces those laws against companies that choose to violate them — says much about our divisive times.

And all of this brings us to the broadest lesson of IBM and the Holocaust: technological development is by its nature new — a lurch into the unknown and unexplored — but as I have discovered over and over again in my career, history has much to teach us. Even though the specifics of the technologies we work on may be without precedent, the humanity they serve to advance endures across generations; those who fret about the future would be well advised to learn from the past!

Building up networks of zones on Tribblix The Trouble with Tribbles...

With OpenSolaris and derivatives such as illumos, we gained the ability to build a whole IT infrastructure in a single box, using virtualized networking (crossbow) to build the underlying network and then attaching virtualized systems (zones) atop virtualized storage (zfs).

Some of this was present in Solaris 10, but it didn't have crossbow so the networking piece was a bit tricky (although I did manage to get surprisingly far by abusing the loopback interface).

In Tribblix, I've long had the notion of a router or proxy zone, which acts as a bridge between the outside world and a local virtual subnet. For the next release I've been expanding that into something much more flexible and capable.

What did I need to put this together?

The first thing is a virtual network. You use dladm to create an etherstub. Think of that as a virtual switch you can connect network links to.

To connect that to the world, a zone is created with 2 network interfaces (vnics). One over the system interface so it can connect to the outside world, and one over the etherstub.

That special router zone is a little bit more than that. It runs NAT to allow any traffic on the internal subnet - simple NAT, nothing complicated here. In order to do that the zone has to have IPFilter installed, and the zone creation script creates the right ipnat configuration file and ensures that IPFilter is started.

You also need to have IPFilter installed in the global zone. It doesn't have to be running there, but the installation is required to create the IPFilter devices. Those IPFilter devices are then exposed to the zone, and for that to work the zone needs to use exclusive-ip networking rather than shared-ip (and would need to do so anyway for packet forwarding to work).

One thing I learnt was that you can't lock the router zone's networking down with allowed-address. The anti-spoofing protection that allowed-address gives you prevents forwarding and breaks NAT.

The router zone also has a couple of extra pieces of software installed. The first is haproxy, which is intended as an ingress controller. That's not currently used, and could be replaced by something else. The second is dnsmasq, which is used as a dhcp server to configure any zones that get connected to the subnet.

With a network segment in place, and a router zone for management, you can then create extra zones.

The way this works in Tribblix is that if you tell zap to create a zone with an IP address that is part of a private subnet, it will attach its network to the corresponding etherstub. That works fine for an exclusive-ip zone, where the vnic can be created directly over the etherstub.

For shared-ip zones it's a bit trickier. The etherstub isn't a real network device, although for some purposes (like creating a vnic) it looks like one. To allow shared-ip, I create a dedicated shared vnic over the etherstub, and the virtual addresses for shared-ip zones are associated with that vnic. For this to work, it has to be plumbed in the global zone, but doesn't need an address there. The downside to the shared-ip setup (or it might be an upside, depending on what the zone's going to be used for) is that in this configuration it doesn't get a network route; normally this would be inherited off the parent interface, but there isn't an IP configuration associated with the vnic in the global zone.

The shared-ip zone is handed its IP address. For exclusive-ip zones, the right configuration fragment is poked into dnsmasq on the router zone, so that if the zone asks via dhcp it will get the answer you configured. Generally, though, if I can directly configure the zone I will. And that's either by putting the right configuration into the files in a zone so it implements the right networking at boot, or via cloud-init. (Or, in the case of a solaris10 zone, I populate sysidcfg.)

There's actually a lot of steps here, and doing it by hand would be rather (ahem, very) tedious. So it's all automated by zap, the package and system administration tool in Tribblix. The user asks for a router zone, and all it needs to be given is the zone's name, the public IP address, and the subnet address, and all the work will be done automatically. It saves all the required details so that they can be picked up later. Likewise for a regular zone, it will do all the configuration based on the IP address you specify, with no extra input required from the user.

The whole aim here is to make building zones, and whole systems of zones, much easier and more reliable. And there's still a lot more capability to add.

A Gap in the TrustZone preset settings for the LPC55S69 Oxide Computer Company Blog

We’re very excited to have announced the general availability of our cloud computer. As part of this work, we continue to build on top of the LPC55S69 from NXP as our Root of Trust. We’ve discovered some gaps when using TrustZone preset settings on the LPC55S69 that can allow for unexpected behavior including enabling debug settings and exposure of the UDS (Unique Device Secret). These issues require a signed image or access at manufacturing time.

How to (safely, securely) configure a chip

The LPC55S69 uses the Armv8-m architecture which includes TrustZone-M. We’ve previously discussed some aspects of the Armv8-m architecture and presented on it in more detail. Fundamentally, setting up TrustZone-M is simply a matter of putting the right values in the right registers. The word "simply" is, of course, doing a lot of heavy lifting here. TrustZone-M must also be set up in conjunction with the Memory Protection Unit (MPU) and any other vendor specific security settings. Once the ideal settings have been decided upon, there’s still the matter of actually performing the register programming sequence. NXP offers a feature called TrustZone preset data to make this programming easier. Register data may optionally be appended to the end of an image for the LPC55S69, and the ROM will set the registers before jumping into the user image. Some of those registers may also be configured to prevent futher modification. This means the user image does not need to be concerned with the settings for those registers.

The structure used to configure the registers looks like the following:

              typedef struct _tzm_secure_config
{
  uint32_t cm33_vtor_addr;  /*! CM33 Secure vector table address */
  uint32_t cm33_vtor_ns_addr; /*! CM33 Non-secure vector table address */
  uint32_t cm33_nvic_itns0; /*! CM33 Interrupt target non-secure register 0 */
  uint32_t cm33_nvic_itns1; /*! CM33 Interrupt target non-secure register 1 */
  uint32_t mcm33_vtor_addr; /*! MCM33 Secure vector table address */
  uint32_t cm33_mpu_ctrl; /*! MPU Control Register.*/
  uint32_t cm33_mpu_mair0; /*! MPU Memory Attribute Indirection Register 0 */
  uint32_t cm33_mpu_mair1; /*! MPU Memory Attribute Indirection Register 1 */
  uint32_t cm33_mpu_rbar0; /*! MPU Region 0 Base Address Register */
  uint32_t cm33_mpu_rlar0; /*! MPU Region 0 Limit Address Register */
  uint32_t cm33_mpu_rbar1; /*! MPU Region 1 Base Address Register */
  uint32_t cm33_mpu_rlar1; /*! MPU Region 1 Limit Address Register */
  uint32_t cm33_mpu_rbar2; /*! MPU Region 2 Base Address Register */
  uint32_t cm33_mpu_rlar2; /*! MPU Region 2 Limit Address Register */
  uint32_t cm33_mpu_rbar3; /*! MPU Region 3 Base Address Register */
  uint32_t cm33_mpu_rlar3; /*! MPU Region 3 Limit Address Register */
  uint32_t cm33_mpu_rbar4; /*! MPU Region 4 Base Address Register */
  uint32_t cm33_mpu_rlar4; /*! MPU Region 4 Limit Address Register */
  uint32_t cm33_mpu_rbar5; /*! MPU Region 5 Base Address Register */
  uint32_t cm33_mpu_rlar5; /*! MPU Region 5 Limit Address Register */
  uint32_t cm33_mpu_rbar6; /*! MPU Region 6 Base Address Register */
  uint32_t cm33_mpu_rlar6; /*! MPU Region 6 Limit Address Register */
  uint32_t cm33_mpu_rbar7; /*! MPU Region 7 Base Address Register */
  uint32_t cm33_mpu_rlar7; /*! MPU Region 7 Limit Address Register */
  uint32_t cm33_mpu_ctrl_ns; /*! Non-secure MPU Control Register.*/
  uint32_t cm33_mpu_mair0_ns; /*! Non-secure MPU Memory Attribute Register 0 */
  uint32_t cm33_mpu_mair1_ns; /*! Non-secure MPU Memory Attribute Register 1 */
  uint32_t cm33_mpu_rbar0_ns; /*! Non-secure MPU Region 0 Base Address Register */
  uint32_t cm33_mpu_rlar0_ns; /*! Non-secure MPU Region 0 Limit Address Register */
  uint32_t cm33_mpu_rbar1_ns; /*! Non-secure MPU Region 1 Base Address Register */
  uint32_t cm33_mpu_rlar1_ns; /*! Non-secure MPU Region 1 Limit Address Register */
  uint32_t cm33_mpu_rbar2_ns; /*! Non-secure MPU Region 2 Base Address Register */
  uint32_t cm33_mpu_rlar2_ns; /*! Non-secure MPU Region 2 Limit Address Register */
  uint32_t cm33_mpu_rbar3_ns; /*! Non-secure MPU Region 3 Base Address Register */
  uint32_t cm33_mpu_rlar3_ns; /*! Non-secure MPU Region 3 Limit Address Register */
  uint32_t cm33_mpu_rbar4_ns; /*! Non-secure MPU Region 4 Base Address Register */
  uint32_t cm33_mpu_rlar4_ns; /*! Non-secure MPU Region 4 Limit Address Register */
  uint32_t cm33_mpu_rbar5_ns; /*! Non-secure MPU Region 5 Base Address Register */
  uint32_t cm33_mpu_rlar5_ns; /*! Non-secure MPU Region 5 Limit Address Register */
  uint32_t cm33_mpu_rbar6_ns; /*! Non-secure MPU Region 6 Base Address Register */
  uint32_t cm33_mpu_rlar6_ns; /*! Non-secure MPU Region 6 Limit Address Register */
  uint32_t cm33_mpu_rbar7_ns; /*! Non-secure MPU Region 7 Base Address Register */
  uint32_t cm33_mpu_rlar7_ns; /*! Non-secure MPU Region 7 Limit Address Register */
  uint32_t cm33_sau_ctrl;
  uint32_t cm33_sau_rbar0;/*! SAU Region 0 Base Address Register */
  uint32_t cm33_sau_rlar0;/*! SAU Region 0 Limit Address Register */
  uint32_t cm33_sau_rbar1;/*! SAU Region 1 Base Address Register */
  uint32_t cm33_sau_rlar1;/*! SAU Region 1 Limit Address Register */
  uint32_t cm33_sau_rbar2;/*! SAU Region 2 Base Address Register */
  uint32_t cm33_sau_rlar2;/*! SAU Region 2 Limit Address Register */
  uint32_t cm33_sau_rbar3;/*! SAU Region 3 Base Address Register */
  uint32_t cm33_sau_rlar3;/*! SAU Region 3 Limit Address Register */
  uint32_t cm33_sau_rbar4;/*! SAU Region 4 Base Address Register */
  uint32_t cm33_sau_rlar4;/*! SAU Region 4 Limit Address Register */
  uint32_t cm33_sau_rbar5;/*! SAU Region 5 Base Address Register */
  uint32_t cm33_sau_rlar5;/*! SAU Region 5 Limit Address Register */
  uint32_t cm33_sau_rbar6;/*! SAU Region 6 Base Address Register */
  uint32_t cm33_sau_rlar6;/*! SAU Region 6 Limit Address Register */
  uint32_t cm33_sau_rbar7;/*! SAU Region 7 Base Address Register */
  uint32_t cm33_sau_rlar7;/*! SAU Region 7 Limit Address Register */
  uint32_t flash_rom_slave_rule;/*! FLASH/ROM Slave Rule Register 0 */
  uint32_t flash_mem_rule0;/*! FLASH Memory Rule Register 0 */
  uint32_t flash_mem_rule1;/*! FLASH Memory Rule Register 1 */
  uint32_t flash_mem_rule2;/*! FLASH Memory Rule Register 2 */
  uint32_t rom_mem_rule0;/*! ROM Memory Rule Register 0 */
  uint32_t rom_mem_rule1;/*! ROM Memory Rule Register 1 */
  uint32_t rom_mem_rule2;/*! ROM Memory Rule Register 2 */
  uint32_t rom_mem_rule3;/*! ROM Memory Rule Register 3 */
  uint32_t ramx_slave_rule;
  uint32_t ramx_mem_rule0;
  uint32_t ram0_slave_rule;
  uint32_t ram0_mem_rule0;/*! RAM0 Memory Rule Register 0 */
  uint32_t ram0_mem_rule1;/*! RAM0 Memory Rule Register 1 */
  uint32_t ram1_slave_rule; /*! RAM1 Memory Rule Register 0 */
  uint32_t ram1_mem_rule1;/*! RAM1 Memory Rule Register 1 */
  uint32_t ram2_mem_rule1;/*! RAM2 Memory Rule Register 1 */
  uint32_t ram3_mem_rule0;/*! RAM3 Memory Rule Register 0 */
  uint32_t ram3_mem_rule1;/*! RAM3 Memory Rule Register 1 */
  uint32_t ram4_slave_rule;
  uint32_t ram2_mem_rule0;
  uint32_t ram3_slave_rule;
  uint32_t ram1_mem_rule0;
  uint32_t ram2_slave_rule;
  uint32_t ram4_mem_rule0;/*! RAM4 Memory Rule Register 0 */
  uint32_t apb_grp_slave_rule;/*! APB Bridge Group Slave Rule Register */
  uint32_t apb_grp0_mem_rule0;/*! APB Bridge Group 0 Memory Rule Register 0 */
  uint32_t apb_grp0_mem_rule1;/*! APB Bridge Group 0 Memory Rule Register 1 */
  uint32_t apb_grp0_mem_rule2;/*! APB Bridge Group 0 Memory Rule Register 2 */
  uint32_t apb_grp0_mem_rule3;/*! APB Bridge Group 0 Memory Rule Register 3 */
  uint32_t apb_grp1_mem_rule0;/*! APB Bridge Group 1 Memory Rule Register 0 */
  uint32_t apb_grp1_mem_rule1;/*! APB Bridge Group 1 Memory Rule Register 1 */
  uint32_t apb_grp1_mem_rule2;/*! APB Bridge Group 1 Memory Rule Register 2 */
  uint32_t apb_grp1_mem_rule3;/*! APB Bridge Group 1 Memory Rule Register 3 */
  uint32_t ahb_periph0_slave_rule0;/*! AHB Peripherals 0 Slave Rule Register 0 */
  uint32_t ahb_periph0_slave_rule1;/*! AHB Peripherals 0 Slave Rule Register 1 */
  uint32_t ahb_periph1_slave_rule0;/*! AHB Peripherals 1 Slave Rule Register 0 */
  uint32_t ahb_periph1_slave_rule1;/*! AHB Peripherals 1 Slave Rule Register 1 */
  uint32_t ahb_periph2_slave_rule0;/*! AHB Peripherals 2 Slave Rule Register 0 */
  uint32_t ahb_periph2_slave_rule1;/*! AHB Peripherals 2 Slave Rule Register 1 */
  uint32_t ahb_periph2_mem_rule0;/*! AHB Peripherals 2 Memory Rule Register 0*/
  uint32_t usb_hs_slave_rule0; /*! HS USB Slave Rule Register 0 */
  uint32_t usb_hs__mem_rule0; /*! HS USB Memory Rule Register 0 */
  uint32_t sec_gp_reg0;/*! Secure GPIO Register 0 */
  uint32_t sec_gp_reg1;/*! Secure GPIO Register 1 */
  uint32_t sec_gp_reg2;/*! Secure GPIO Register 2 */
  uint32_t sec_gp_reg3;/*! Secure GPIO Register 3 */
  uint32_t sec_int_reg0;/*! Secure Interrupt Mask for CPU1 Register 0 */
  uint32_t sec_int_reg1;/*! Secure Interrupt Mask for CPU1 Register 1 */
  uint32_t sec_gp_reg_lock;/*! Secure GPIO Lock Register */
  uint32_t master_sec_reg;/*! Master Secure Level Register */
  uint32_t master_sec_anti_pol_reg;
  uint32_t cm33_lock_reg; /*! CM33 Lock Control Register */
  uint32_t mcm33_lock_reg; /*! MCM33 Lock Control Register */
  uint32_t misc_ctrl_dp_reg;/*! Secure Control Duplicate Register */
  uint32_t misc_ctrl_reg;
  uint32_t misc_tzm_settings;
} tzm_secure_config_t;

An implementation detail of the ROM is that the settings for these registers are (mostly) applied in the order shown in the structure. This means that the very first register that gets changed is VTOR which switches the vector table from the one in the ROM to the user provided one. Any faults that occur after VTOR is changed will be handled by user code, not ROM code. This turns out to have some "interesting" side effects.

(Un)locking debug access

The LPC55S69 offers debug access via standard ARM interfaces (SWD). Debug access can be configured to be always available, always disabled, or only available to authenticated users. These settings are designed to be applied at manufacturing time via the CMPA region. Debugging is disabled by default while executing in the ROM and only enabled (if allowed) as the very last step before jumping to user code. The debug settings are also locked out, preventing further modification from user code except in specific authenticated circumstances. Because debug access is highly sensitive, it makes sense to minimize the amount of time the ROM spends with it enabled.

If the debug settings are applied last, this means that the TrustZone preset settings must be applied before them. Combine this information with the implementation detail of how the preset setting are applied, if the code faults after VTOR is changed but before we apply the debug settings, it will be possible to run in user controlled code with debug registers open for modification.

How easy is it to actually trigger this? Very easy. Other registers in the preset structure include settings for the MPU. Setting the enable bit in MPU_CTRL without any other regions set is enough to trigger the fault. NXP actually says in their manual that you need to make sure the entire ROM region is configured as secure privileged and executable otherwise "boot process will fail". "fail" in this case is vectoring off into the appropriate fault handler of the user code.

This makes the following sequence possible:

  • Have debug disabled in the CMPA

  • Sign an image with TrustZone preset settings with a valid VTOR and MPU settings that exclude the ROM region

  • Have the MemManage fault handler follow the standard sequence to enable debugging

  • The image will trigger the fault handler and have debugging enabled despite the settings in the CMPA

This does require access to the secure boot signing key, but it’s a departure from the presentation of the CMPA settings as being independent of any possible settings in an image.

Extracting the UDS

One additional step in the setting of the debug registers is a final lockout of some PUF registers. The PUF (Physically Unclonable Function) is designed to tie secrets to a specific chip. When a secret is PUF encoded, it can only be decoded by that specific chip. The LPC55S69 uses the PUF to encode the Unique Device Secret (UDS) for use as the basis of a DICE identity. To ensure the identity is tied to the specific chip and cannot be cloned, access to the PUF index for the UDS is locked out after it is used.

The UDS is always locked out for secure boot images, but the ROM relies on the code path for debug settings to lock out for non-secure images. TrustZone preset settings can be used with non-secure CRC images which means that the previously described issue can be used to extract the UDS since the final lockout will never occur.

Requiring an unsigned image significantly limits the impact to cases such as the following:

  • Attacker at manufacturing time runs ISP command to generate the UDS on an unprogrammed LPC55S69

  • Attacker runs an unsigned image with a buggy TrustZone preset to extract the UDS

  • Attacker continues on with the rest of the manufacturing sequence, making sure not to re-generate the extracted UDS

This may be mitigated with sufficient tooling at manufacturing time but the issue still remains.

Is this a security issue?

There was disagreement between Oxide and NXP about whether this qualified as a true security vulnerability (Oxide’s opinion) vs. a gap in design and documentation (NXP’s opinion). The areas of disagreement were related to what exactly it was possible to do with these issues and what was required to make them happen. Unlocking the debug ports requires access to the secure boot signing keys and arguably if you can sign something with a bad TrustZone preset you don’t need to bother with debug port access; once your secure boot integrity has been compromised all bets are off. Oxide believes this undersells the potential for mitigation and why this should be considered a security issue: there could be circumstances where having debug port access would make extracting assets significantly easier.

Transparency is an Oxide value and that is what we strive for in bug reporting. Our goal is to make sure that issues are acknowledged and information about the bug is made widely available. NXP agreed to acknowledge this issue as a non-security errata and there will not be a CVE filed at this time. Given the narrow scope and lack of agreement between Oxide and NXP, filing a CVE would provide little benefit. If new information were to come to light from Oxide, NXP, or other researchers who are interested in our findings, we would re-evaluate this decision.

We are pleased that NXP is choosing to protect its customers by informing them of this gap. A bigger takeaway from this issue is to understand the limitations of secure/verified boot. A proper secure boot implementation will ensure that the only code that runs on a device is code that has been signed with an appropriate private key. Secure boot provides no assertions about the implementation of that code. The strength of secure boot is bounded by the code you choose to sign. In the absence of a fix for this errata, we will not be using the TrustZone preset data. If other customers choose to continue using TrustZone preset data they will need to be diligent about validating their inputs to avoid introducing gaps in the security model. Oxide has a commitment to open firmware to ensure our customers can have confidence in what code we will be signing to run on their machines.

Timeline

2023-08-16

Oxide discovers issue while reviewing image settings

2023-08-21

Oxide discloses issue to NXP PSIRT with a disclosure deadline of 2023-11-20

2023-08-21

Oxide PSIRT acknowledges receipt

2023-10-11

NXP requests meeting to discuss the report with Oxide

2023-10-19

Oxide and NXP meet to discuss the reported issues

2023-10-23

Oxide suggests documentation clarifications

2023-10-27

NXP agress to issue an errata

2023-11-20

Oxide publishes this blog post as a disclosure

Is it worse for John Fisher? The Observation Deck

“It’s been worse for me than for you.” These extraordinary words came out of the mouth of John Fisher, incompetent owner of the Oakland Athletics, on the eve of getting approval from Major League Baseball to rip its roots out of the East Bay.

I have been reflecting a lot on these words. Strictly from a public relations point of view, they are gobsmackingly disrespectful, plumbing new depths of malpractice even for the worst ownership in sports. And of course, they are obviously wrong, as this clumsy move is worse for literally everyone else than it is for John Fisher. It is worse for the fans having their hearts ripped out; worse for the Oakland employees losing their jobs; worse for the many small businesses that make their livelihood on the team; worse for the players who have been told their entire athletic careers to take accountability only to be forced to watch in silence as their skinflint ownership takes none.

But there is a kind of truth to these words too, in that there are ways that it is worse for Fisher, for we have things that he cannot. Take, for example, the Reverse Boycott, the game on June 13th, 2023 when Oakland fans deliberately attended to show that we are, in fact, not the problem. Everything about that game was extraordinary: the energy was post-season electric as the worst-in-baseball A’s entered the game with a best-in-baseball win streak. The Coliseum was rocking, in a way that only the Coliseum can. Then, at the top of the 5th inning, the fans fell silent in protest of the move to Las Vegas. There was no plan beyond this; no one really knew what would happen when the silence ended. What happened next was spontaneous, straight from a shared heart that was breaking: a deafening chant, rolling and crashing over the stadium. “SELL! THE! TEAM! SELL! THE! TEAM!” (I accidentally recorded this; you can hear the emotion in my own voice — and that of my 11-year-old daughter next to me.) The game ended as only fiction would have it: with Trevor May striking out the best team in baseball to seal an improbable win for Oakland. The biggest surprise of the night was the sheer joy of it all: it was a New Orleans funeral for Oakland baseball, and we were glad to be there as a family. As I told my kids on the drive home, it was a night that they would one day tell their own grandchildren about.

How is it that a baseball game can conjure such emotion, let alone one from a losing franchise with a signed death warrant? Because, simply: sports are about much more than what’s on the field. Sports bring us together — they bind us across generation, disposition, and circumstance. A family that might agree on little else may shout in indignant agreement that that wasn’t pass interference or that he was obviously safe. They give us solidarity with one another: they give us stuff to believe in together, to celebrate together — and to grieve for together. In short, sports are the raw id of our own humanity. The Reverse Boycott distilled all of it into a single, singular night — binding us together in the kind of shared adversity that has always been the stuff of tribal legend.

And it is in this regard that John Fisher might be right: it is, in fact worse for him, because this shared humanity of sports eludes him. His camera roll is not filled with A’s-themed birthday parties, or of selfies with his kids in rally caps, or of toddlers running the bases late on a Sunday afternoon. It would be tempting to say that he instead sees sports as only a business, but even this gives him too much credit: the only business he knows is assuring the mechanics of inheritance — of hoarding the spoils of his birth. In this regard, he is at least passably capable: he took MLB at its word that it would cut off his welfare payments if he did not secure a stadium deal by January 2024, and dutifully secured a deal, however obviously disastrous. It’s worse for John Fisher because this has all been laid bare: the real cost of securing his allowance is that his ineptitude is no longer merely an open secret among beleaguered A’s fans — he is now MLB’s famous failson, the Connor Roy of professional sports.

Whatever success John Fisher may find in Las Vegas, he will not be able to outrun the wreckage he is leaving behind here in Oakland. In John Fisher’s obituary, it will not speak of what he built, but of what he broke; not of what he gave, but of what he took away. He will be a stain on his family, who will spend their lives trying to apologize for him. He himself will find that no amount of success will absolve him of the scar that he is leaving on the East Bay’s heart. And the much more likely scenario — abject commercial failure — will merely confirm for him his own nightmares: that he is exactly the klutz and dunce that he surely fears himself to be. And if John Fisher will always be searching for what he cannot get, we Oakland A’s fans will always have what cannot be taken away: our solidarity with — and love for — one another. We are raucous, brainy, creative, and eclectic; our lives are richer for having one another in them. John Fisher has none of this, and never will. As terrible as it is for us, it may indeed be worse for him.

OmniOS Community Edition r151048 OmniOS Community Edition

OmniOSce v11 r151048 is out!

On the 6th of November 2023, the OmniOSce Association has released a new stable version of OmniOS - The Open Source Enterprise Server OS. The release comes with many tool updates, brand-new features and additional hardware support. For details see the release notes.

Note that r151044 is now end-of-life. You should upgrade to r151046 or r151048 to stay on a supported track. r151046 is an LTS release with support until May 2026, and r151048 is a stable release with support until November 2024.

For anyone who tracks LTS releases, the previous LTS - r151038 - now enters its last six months. You should plan to upgrade to r151046 for continued LTS support.

OmniOS is fully Open Source and free. Nevertheless, it takes a lot of time and money to keep maintaining a full-blown operating system distribution. Our statistics show that there are almost 2’000 active installations of OmniOS while fewer than 20 people send regular contributions. If your organisation uses OmniOS based servers, please consider becoming a regular patron or taking out a support contract.


Any problems or questions, please get in touch.

Keeping python modules in check The Trouble with Tribbles...

Any operating system distribution - and Tribblix is no different - will have a bunch of packages for python modules.

And one thing about python modules is that they tend to depend on other python modules. Sometimes a lot of python modules. Not only that, the dependency will be on a specific version - or range of versions - of particular modules.

Which opens up the possibility that two different modules might require incompatible versions of a module they both depend on.

For a long time, I was a bit lax about this. Most of the time you can get away with it (often because module writers are excessively cautious about newer versions of their dependencies). But occasionally I got bitten by upgrading a module and breaking something that used it, or breaking it because a dependency hadn't been updated to match.

So now I always check that I've got all the dependencies listed in packaging with

pip3 show modulename

and every time I update a module I check the dependencies aren't broken with

pip3 check

Of course, this relies on the machine having all the (interesting) modules installed, but on my main build machine that is generally true.

If an incompatibility is picked up by pip3 check then I'll either not do the update, or update any other modules to keep in sync. If an update is impossible, I'll take a note of which modules are blockers, and wait until they get an update to unjam the process.

A case in point was that urllib3 went to version 2.x recently. At first, nothing would allow that, so I couldn't update urllib3 at all. Now we're in a situation where I have one module I use that won't allow me to update urllib3, and am starting to see a few modules requiring urllib3 to be updated, so those are held downrev for the time being.

The package dependencies I declare tend to be the explicit module dependencies (as shown by pip3 show). Occasionally I'll declare some or all of the optional dependencies in packaging, if the standard use case suggests it. And there's no obvious easy way to emulate the notion of extras in package dependencies. But that can be handled in package overlays, which is the safest way in any case.

Something else the checking can pick up is when a dependency is removed, which is something that can be easily missed.

Doing all the checking adds a little extra work up front, but should help remove one class of package breakage.

It seemed like a simple problem to fix The Trouble with Tribbles...

While a bit under the weather last week, I decided to try and fix what at first glance appears to be a simple problem:

need to ship the manpage with exa

Now, exa is a modern file lister, and the package on Tribblix doesn't ship a man page. The reason for that, it turns out, is that there isn't a man page in the source, but you can generate one.

To build the man page requires pandoc. OK, so how to get pandoc, which wasn't available on Tribblix? It's written in Haskell, and I did have a Haskell package.

Only my version of Haskell was a bit old, and wouldn't build pandoc. The build complains that it's too old and unsupported. You can't even build an old version of pandoc, which is a little peculiar.

Off to upgrade Haskell then. You need Haskell to build Haskell, and it has some specific requirements about precisely which versions of Haskell work. I wanted to get to 9.4, which is the last version of Haskell that builds using make (and I'll leave Hadrian for another day). You can't build Haskell 9.4 with 9.2 which it claims to be too new, you have to go back to 9.0.

Fortunately we do have some bootstrap kits for illumos available, so I pulled 9.0 from there, successfully built Haskell, then cabal, and finally pandoc.

Back to exa. At which point you notice that it's been deprecated and replaced by eza. (This is a snag with modern point tools. They can disappear on a whim.)

So let's build eza. At which point I find that the MSRV (Minimum Supported Rust Version) has been bumped to 1.70, and I only had 1.69. Another update required. Rust is actually quite simple to package, you can just download the stable version and package it.

After all this, exa still doesn't have a man page, because it's deprecated (if you run man exa you get something completely different from X.Org). But I did manage to upgrade Haskell and Cabal, I managed to package pandoc, I updated rust, and I added a replacement utility - eza - which does now come with a man page.

The Cloud Computer Oxide Computer Company Blog

Today we are announcing the general availability of the world’s first commercial cloud computer — along with our $44M Series A financing.

From the outset at Oxide, and as I outlined in my 2020 Stanford talk, we have had three core beliefs as a company:

  1. Cloud computing is the future of all computing infrastructure.

  2. The computer that runs the cloud should be able to be purchased and not merely rented.

  3. Building a cloud computer necessitates a rack-level approach — and the co-design of both hardware and software.

Of these beliefs, the first is not at all controversial: the agility, flexibility, and scalability of cloud computing have been indisputably essential for many of the services that we depend on in the modern economy.

The degree that the second belief is controversial, however, depends on who you are: for those that are already running on premises due to security, regulatory, economic, or latency reasons, it is self-evident that computers should be able to be purchased and not merely rented. But to others, this has been more of a revelation — and since we started Oxide, we have found more and more people realize that the rental-only model for the cloud is not sustainable. Friends love to tag us on links to VC thinkpieces, CTO rants, or analyst reports on industry trends — and we love people thinking of us, of course (even when being tagged for the dozenth time!) — but the only surprise is how surprising it continues to be for some folks.

The third belief — that the development of a cloud computer necessitates rack-scale design of both hardware and software — may seem iconoclastic to those who think only in terms of software, but it is in fact not controversial among technologists: as computing pioneer Alan Kay famously observed, "people who are really serious about software should make their own hardware." This is especially true in cloud computing, where the large public cloud companies have long ago come to the conclusion that they needed to be designing their own holistic systems. But if this isn’t controversial, why hasn’t there been a cloud computer before Oxide’s? First, because it’s big: to meaningfully build a cloud computer, one must break out of the shackles of the 1U or 2U server, and really think about the rack as the unit of design. Second, it hasn’t been done because it’s hard: co-designing hardware and software that spans compute, networking, and storage requires building an extraordinary team across disparate disciplines, coupling deep expertise with a strong sense of versatility, teamwork, and empathy. And the team isn’t enough by itself: it also needs courage, resilience, and (especially) time.

So the biggest question when we set out was not "is the market there?" or "is this the right way to do it?", but rather could we pull this off?

Pulling it off

We have indeed pulled it off — and it’s been a wild ride! While we have talked about the trek quite a bit on our podcast, Oxide and Friends (and specifically, Steve and I recently answered questions about the rack), our general availability is a good opportunity to reflect on some of the first impressions that the Oxide cloud computer has made upon those who have seen it.

"Where are all the boxes?"

The traditional rack-and-stack approach starts with a sea of boxes arriving with servers, racks, cabling, etc. This amounts to a literal kit car approach — and it starts with tedious, dusty, de-boxing. But the Oxide rack ships with everything installed and comes in just one box — a crate that is its own feat of engineering. All of this serves to dramatically reduce the latency from equipment arrival to power on and first provision — from weeks and months to days or even hours.

"Is it on?"

We knew at the outset that rack-level design would afford us the ability to change the geometry of compute sleds — that we would get higher density in the rack by trading horizontal real estate for vertical. We knew, too, that we were choosing to use 80mm fans for their ability to move more air much more efficiently — so much so that we leveraged our approach to the supply chain to partner with Sanyo Denki (our fan provider) to lower the minimum speed of the fans from 5K RPM to the 2K RPM that we needed. But adding it up, the Oxide rack has a surprising aesthetic attribute: it is whisper quiet. To those accustomed to screaming servers, this is so unexpected that when we were getting FCC compliance, the engineer running the test sheepishly asked us if we were sure the rack was on — when it was dissipating 15 kW! That the rack is quiet wasn’t really deliberate (and we are frankly much more interested in the often hidden power draw that blaring fan noise represents), but it does viscerally embody much of the Oxide differentiation with respect to both rack-level design and approach to the supply chain.

"Where are the cables?"

Anyone accustomed to a datacenter will note the missing mass of cold-aisle cabling that one typically sees at the front of a rack. But moving to the back of the rack reveals only a DC busbar and a tight, cabled backplane. This represents one of the bigger bets we made: we blindmated networking. This was mechanically tricky, but the payoff is huge: capacity can be added to the Oxide cloud computer simply by snapping in a new compute sled — nothing to be cabled whatsoever! This is a domain in which we have leapfrogged the hyperscalers, who (for their own legacy reasons) don’t do it this way. This can be jarring to veteran technologists. As one exclaimed upon seeing the rack last week, "I am both surprised and delighted!" (Or rather: a very profane variant of that sentiment.)

"You did your own switch too?!"

When we first started the company, one of our biggest technical quandaries was what to do about the switch. At some level, both paths seemed untenable: we knew from our own experience that integrating with third-party switches would lead to exactly the kind of integration pain for customers that we sought to alleviate — but it also seemed outrageously ambitious to do our own switch in addition to everything else we were doing. But as we have many times over the course of Oxide, we opted for the steeper path in the name of saving our customers grief, choosing to build our own switch. If it has to be said, getting it working isn’t easy! And of course, building the switch is insufficient: we also needed to build our own networking software — to say nothing of the management network required to be able to manage compute sleds when they’re powered off.

"Wait, that’s part of it?!"

It’s one thing to say that all of the software that one needs to operate the cloud computer is built in — but it’s another to actually see what that software includes. And for many, it’s seeing the Oxide web console (or its live demo!) that really drives the message home: yes, all of the software is included. And because the console implementation is built on the public API, everything that one can do in the console for the Oxide rack is also available via CLI and API — a concrete manifestation of our code-as-contract approach.

"And there’s no separate licensing?"

One common source of pain for users of on-prem infrastructure has been license management: financial pain due to over-paying and under-utilizing, and operational pain in the navigation of different license terms, different expiration dates, unpredictable dependencies, and uncertain vendor futures. From the beginning we knew that we wanted to deliver a delightful, integrated experience: we believe that cloud computers should come complete with all system software built-in, and with no additional licensing to manage or to pay for. Bug fixes and new features are always only an update away and do not require a multi-departmental discussion to determine value and budget.

"It’s all open source?"

While the software is an essential part of the Oxide cloud computer, what we sell is in fact the computer. As a champion of open source, this allows Oxide a particularly straightforward open source strategy: our software is all open. So you don’t need to worry about hinky open core models or relicensing surprises. And from a user perspective, you are assured levels of transparency that you don’t get in the public cloud — let alone the proprietary on-prem world.

Getting your own first impression

We’re really excited to have the first commercial cloud computer — and for it to be generally available! If you yourself are interested, we look forward to it making its first impression on you — reach out to us!

When zfs was young The Trouble with Tribbles...

On the Solaris 10 Platinum Beta program, one of the most exciting promised features was ZFS, the new file system.

I was especially interested, given that I was in a data-heavy position at the time. The limits of UFS were painful, we had datasets into several terabytes already - and even the multiterabyte file system support that got added was actually pretty useless because the inode density was so low. We tried QFS and SAM-QFS, and they were pretty appalling too.

ZFS was promised, and didn't arrive. In fact, there were about 4 of us on the beta program who saw the original zfs implementation, and it was quite different from what we have now. What eventually landed as zfs in Solaris was a complete rewrite. The beta itself was interesting - we were sent the driver, 3 binaries, and a 3-line cheatsheet, and that was it. There was a fundamental philosophy here that the whole thing was supposed to be so easy to use and sufficiently obvious that it didn't need a manual, and that was actually true. (It's gotten rather more complex since, to be fair.)

The original version was a bit different in terms of implementation than what you're used to, but not that much. The most obvious change was that originally there wasn't a top-level file system for a pool. You created a pool, and then created your file systems. I'm still not sure which is the correct choice. And there was a separate zacl program to handle the ACLs, which were rather different.

In fact, ACLs have been a nightmare of bad implementations throughout their history on Solaris. I already had previous here, having got the POSIX draft ACL implementation reworked for UFS. The original zfs implementation had default aka inheritable ACLs applied to existing objects in a directory. (If you don't immediately realise how bad that is, think of what this allows you to do with hard links to files.) The ACL implementations have continued to be problematic - consider that zfs allows 5 settings for the aclinherit property as evidence that we're glittering a turd at this point.

Eventually we did get zfs shipped in a Solaris 10 update, and it's been continually developed since then. The openzfs project has given the file system an independent existence, it's now in FreeBSD, you can run it (and it runs well) on Linux, and in other OS variations too.

One of the original claims was that zfs was infinitely scalable. I remember it being suggested that you could create a separate zfs file system for each user. I had to try this, so got together a test system (an Ultra 2 with an A1000 disk array) and started creating file systems. Sure, it got into several thousand without any difficulty, but that's not infinite - think universities or research labs and you can easily have 10,000 or 100,000 users, we had well over 20,000. And it fell apart at that scale. That's before each is an NFS share, too. So that idea didn't fly.

Overall, though, zfs was a step change. The fact that you had a file system that was flexible and easily managed was totally new. The fact that a file system actually returned correct data rather than randomly hoping for the best was years ahead of anything else. Having snapshots that allowed users to recover from accidentally deleted files without waiting days for a backup to be restored dramatically improved productivity. It's win after win, and I can't imagine using anything else for storing data.

Is zfs perfect? Of course not, and to my mind one of the most shocking things is that nothing else has even bothered to try and come close.

There are a couple of weaknesses with zfs (or related to zfs, if I put it more accurately). One is that it's still a single-node file system. While we have distributed storage, we still haven't really matured that into a distributed file system. The second is that while zfs has dragged storage into the 21st century, allowing much more sophisticated and scalable management of data, there hasn't been a corresponding improvement in backup, which is still stuck firmly in the 1980s.

SMF - part of the Solaris 10 legacy The Trouble with Tribbles...

The Service Management Facility, or SMF, integrated extremely late in the Solaris 10 release cycle. We only got one or two beta builds to test, which seemed highly risky for such a key feature.

So there was very little time to gather feedback from users. And something that central really can't be modified once it's released. It had to work first time.

That said, we did manage some improvements. The current implementation of `svcs -x` is largely due to me struggling to work out why a service was broken.

One of the obvious things about SMF is that it relies on manifests written in XML. Yes, that's of its time - there's a lot of software you can date by the file format it uses.

I don't have a particular problem with the use of XML here, to be honest. What's more of a real problem is that the manifest files were presented as a user interface rather than an internal implementation detail, so that users were forced to write XML from scratch with little to no guidance.

There are a lot of good features around SMF.

Just the very basic restart of an application that dies is something that's so blindingly obvious as a requirement in an operating system. So much so that once it existed I refused to support anything that didn't have SMF when I was on call - after all, most of the 3am phone calls were to simply restart a crashed application. And yes, when we upgraded our systems to Solaris 10 with SMF our availability went way up and the on-call load plummeted.

Being able to grant privileges to a service, and just within the context of that service, without having to give privileges to an application (eg set*id) or a user, makes things so much safer. Although in practice it's letting applications bind to privileged ports while running as a regular user, as that's far and away the most common use case.

Dependencies has been a bit of a mixed bag. Partly because working out what the dependencies should be in the first place is just hard to get right, but also because dependency declaration is bidirectional - you can inject a dependency on yourself into another service, and that other service may not respond well, or you can create a circular dependency if the two services are developed independently.

One part of dependency management in services is deciding whether a given service should start or not given the state of other services (such as its dependencies). Ideally, you want strict dependency management. In the real world, systems are messy and complicated, the dependency tree isn't terribly well understood, and some failure modes don't matter. And in many cases you want the system to try and boot as far as possible so you can get in and fix it.

A related problem is that we've ended up with a complex mesh of services because someone had to take the old mess of rc scripts and translate them into something that would work on day 1. And nobody - either at the time or since  - has gone though the services and studied whether the granularity is correct. One other thing - that again has never happened - once we got a good handle on what services there are is to look at whether the services we have are sensible, or whether there's an opportunity to rearchitect the system to do things better, And because all these services are now baked into SMF, it's actually quite difficult to do any major reworking of the system.

Not only that, but because people write SMF manifests, they simply copy something that looks similar to the problem at hand, so bad practices and inappropriate dependency declarations multiply.

This is one example of what I see as the big problem with SMF - we haven't got supporting tools that present the administrator with useful abstractions, so that everything is raw.

In terms of configuration management, SMF is very much a mixed bag. Yes, it guarantees a consistent and reproducible state of the system. The snag is that there isn't really an automated way to capture the essential state of a system and generate something that will reproduce it (either later or elsewhere) - it can be done, but it's essentially manual. (Backing up the state is a subset of this problem.)

It's clear that there were plans to extend the scope of SMF. Essentially, to be the Solaris version of the Windows registry. Thankfully (see also systemd for where this goes wrong) that hasn't happened much.

In fact, SMF hasn't really involved in any material sense since the day it was introduced. It's very much stuck in time.

There were other features that were left open. For example, there's the notion of the scope of SMF, and the only one available right now is the "localhost" scope - see the smf(7) manual in illumos - so in theory there could be other, non-localhost, scopes. And there was the notion of monitor methods, which never appeared but I can imagine solving a range of niggling application issues I've seen over the years.


The Final Lesson Z In ASCII - Writing

The final lesson my father taught me.

Tracing Kernel Functions: How the illumos AMD64 FBT Provider Intercepts Function Calls Z In ASCII - Writing

A line-by-line breakdown of how the illumos AMD64 FBT provider intercepts function calls.

Debugging a Zig Test Failure Z In ASCII - Writing

Debugging a Zig test failure.

Difficult Hardware Nahum Shalman

This content can also be found as part of https://github.com/tinkerbell/ipxedust/pull/88 in DifficultHardware.md.

Most modern hardware is capable of PXE booting just fine. Sometimes strange combinations of different NIC hardware / firmware connected to specific switches can misbehave.

In those situations you might want to boot into a build of iPXE but completely sidestep the PXE stack in your NIC firmware.

We already ship ipxe.iso that can be used in many situations, but most of the time that requires either an active connection from a virtual KVM client or network access from the BMC to a storage target hosting the ISO.

Some BMCs support uploading a floppy image into BMC memory and booting from that. To support that use case we have started packaging our EFI build into a bootable floppy image that can be used for this purpose.

For other projects or use cases that wish to replicate this functionality, with the appropriate versions of qemu-img, dosfstools and mtools you can build something similar yourself from upstream iPXE like so:

# create a 1440K raw disk image
qemu-img create -f raw ipxe-efi.img 1440K
# format it with an MBR and a FAT12 filesystem
mkfs.vfat --mbr=y -F 12 -n IPXE ipxe-efi.img

# Create the EFI expected directory structure
mmd -i ipxe-efi.img ::/EFI
mmd -i ipxe-efi.img ::/EFI/BOOT

# Copy ipxe.efi as the default x86_64 efi boot file
curl -LO https://boot.ipxe.org/ipxe.efi
mcopy -i ipxe-efi.img ipxe.efi ::/EFI/BOOT/BOOTX64.efi

As of writing other projects are working on automating the upload
of this floppy to a BMC.
See draft PR https://github.com/bmc-toolbox/bmclib/pull/347

Retiring isaexec in Tribblix The Trouble with Tribbles...

One of the slightly unusual features in illumos, and Solaris because that's where it came from, is isaexec.

This facility allows you to have multiple implementations of a binary, and then isaexec will select the best one (for some definition of best).

The full implementation allows you to select from a wide range of architectures. On my machine it'll allow the following list:

amd64 pentium_pro+mmx pentium_pro
pentium+mmx pentium i486 i386 i86

If you wanted, you could ship a highly tuned pentium_pro binary, and eke out a bit more performance.

The common case, though, and it's actually the only way isaexec is used in illumos, is to simply choose between a 32-bit and 64-bit binary. This goes back to when Solaris and illumos supported 32-bit and 64-bit hardware in the same system (and you could actually choose whether to boot 32-bit or 64-bit under certain circumstances). In this case, if you're running a 32-bit kernel you get a 32-bit application; if you're running 64-bit then you can get the 64-bit version of that application.

Not all applications got this treatment. Anything that needed to interface directly with the kernel did (eg the ps utility). And for others it was largely about performance or scalability. But most userland applications were 32-bit, and still are in illumos. (Solaris has migrated most to 64-bit now, we ought to do the same.)

It's been 5 years or more since illumos removed the 32-bit kernel, so the only option is to run in 64-bit mode. So now, isaexec will only ever select the 64-bit binary.

A while ago, Tribblix simply removed the remaining 32-bit binaries that isaexec would have executed on a 32-bit system. This saved a bit of space.

The upcoming m32 release goes further. In almost all cases isaexec is no longer involved, and the 64-bit binary sits directly in the PATH (eg, in /usr/bin). There's none of the wasted redirection. I have put symbolic links in, just in case somebody explicitly referenced the 64-bit path.

This is all done by manipulating packaging - Tribblix runs the IPS package repo through a transformation step to produce the SVR4 packages that the distro uses, and this is just another filter in that process.

(There are a handful of exceptions where I still have 32-bit and 64-bit. Debuggers, for example, might need to match the bitness of the application being debugged. And the way that sh/ksh/ksh93 is installed needs a slightly less trivial transformation to get it right.)

Modernizing scripts in Tribblix The Trouble with Tribbles...

It's something I've been putting off for far too long, but it's about time to modernize all the shell scripts that Tribblix is built on.

Part of the reason it's taken this long is the simple notion of, if it ain't broke, don't fix it.

But some of the scripting was starting to look a bit ... old. Antiquated. Prehistoric, even.

And there's a reason for that. Much of the scripting involved in Tribblix is directly derived from the system administration scripts I've been using since the mid-1990s. That involved managing Solaris systems with SVR4 packages, and when I built a distribution derived from OpenSolaris, using SVR4 packages, I just lifted many of my old scripts verbatim. And even new functionality was copied or slightly modified.

Coming from Solaris 2.3 through 10, this meant that they were very strictly Bourne Shell. A lot of the capabilities you might expect in a modern shell simply didn't exist. And much of the work was to be done in the context of installation (i.e. Jumpstart) where the environment was a little sparse.

The most obvious code smell is extensive use of backticks rather than $(). Some of this I've refactored over time, but looking at the code now, not all that much.

One push for this was adding ShellCheck to Tribblix (it was a little bit of a game getting Haskell and Cabal to play nice, but I digress).

Running ShellCheck across all my scripts gave it a lot to complain about. Some of the complaints are justified, although many aren't (it's very enthusiastic about quoting everything in sight, even when that would be completely wrong).

But generally it's encouraged me to clean the scripts up. It's even managed to find a bug, although looking at code it thinks is just rubbish has found a few more by inspection.

The other push here is to speed things up. Tribblix is often fairly quick in comparison to other systems, but it's not quick enough for me. But more of that story later.

Scribbled Dummy Load Blueprints Josef "Jeff" Sipek

Yesterday, I saw KM1NDY’s blog post titled Scribbled Antenna Blueprints. I wasn’t going to comment…but here I am. :)

I thought I’d setup up a similar contraption (VHF instead of HF) to see what exactly happens. I have a 1 meter long RG-8X jumper with BNC connectors, a BNC T, and a NanoVNA with a 50Ω load calibration standard.

But first, let’s analyze the situation!

Imagine you have a transmitter/signal generator and you connect it to a dummy load. Assuming ideal components, absolutely nothing would get radiated. Now, imagine inserting an open stub between the two. In other words, the T has the following connections:

  1. the generator
  2. 50Ω load
  3. frequency-dependant impedance

Let’s do trivial math! Let’s call the total load that the generator sees Z total and the impedance provided by the stub Z stub . The generator side of the T is connected to the other ports in parallel. Therefore:

Z total = 50 * Z stub 50 + Z stub

So, when would we get a 1:1 SWR? When the generator sees a 50Ω load. When will it see 50Ω? When Z stub is very large; the extreme of which is when that side of the T is open.

If you are a ham, you may remember from when you were studying for the Amateur Extra exam that transmission line stubs can transform impedance. A 1/2 wave stub “copies” the impedance. A 1/4 wave stub “inverts” the impedance. For this “experiment” we need a high impedance. We can get that by either:

  1. open 1/2 wave stub
  2. shorted 1/4 wave stub

Since the “design” from the scribble called for an open, we’ll focus on the 1/2 wave open stub.

Now, back to the experiment. I have a 1 m long RG-8X which has a velocity factor of 0.78. So, let’s calculate the frequency for which it is a 1/2 wave—i.e., the frequency where the wavelength is 2 times the length of the coax:

f = 0.78 * c / 2 m

This equals 116.9 MHz. So, we should expect 1:1 SWR at 117-ish MHz. (The cable is approximately 1 m long and the connectors and the T add some length, so it should be a bit under 117.)

Oh look! 1.015:1 SWR at 110.5 MHz.

(Using 1.058 m in the calculation yields 110.5 MHz. I totally believe that between the T and the connectors there is close to 6 cm of extra (electrical) length.)

But wait a minute, you might be saying, if high impedance is the same as an open, couldn’t we just remove the coax stub from the T and get the same result? Yes! Here’s what the NanoVNA shows with the coax disconnected:

The SWR is 1.095:1 at 110.5 MHz and is better than 1.2:1 across the whole 200 MHz! And look at that impedance! It’s about 50Ω across the whole sweep as well!

We can simplify the circuit even more: since we’re only using 2 ports of the T, we can take the T out and connect the 50Ω load to the NanoVNA directly. We just saved $3 from the bill of materials for this “antenna”!

(In case it isn’t obvious, the previous two paragraphs were dripping with sarcasm, as we just ended up with a dummy load connected to the generator/radio and called it an antenna.)

Will It Antenna?

How could a dummy load transmit and receive signals? Glad you asked. In the real world we don’t use ideal components. There are small mismatches between connectors, the characteristic impedance of the coax is likely not exactly 50Ω, the coax shield is not quite 100%, the transmitter’s/generator’s output isn’t exactly 50Ω, and so on.

However, I expect all these imperfections do not amount to anything that will turn this contraption into an antenna. I bet that the ham that suggested this design used an old piece of coax which had even worse characteristics than the “within manufacturing tolerances” specs you get when the coax is new. Another option is that the coax is supposed to be connected in some non-standard way. Mindy accidentally found one as she was packing up when she disconnected the shield but not the center conductor. Either way, this would make the coax not a 1/2 wave open stub, and the resulting impedance mismatch would cause the whole setup to radiate.

I’d like to thank Mindy for posting about this design. It provided me with a fun evening “project” and a reason to write another blog post.

Finally, I’ll leave you with a photo of my experimental setup.

Speed up zone installation with this one weird trick The Trouble with Tribbles...

Sadly, the trick described below won't work in current releases of Solaris, or any of the illumos distributions. But back in the day, it was pretty helpful.

In Solaris 10, we had sparse root zones - which shared /usr with the global zone, which not only saved space because you didn't need a copy of all the files, but creating them was much quicker because you didn't need to take the time to copy all the files.

Zone installation for sparse root zones was typically about 3 minutes for us - this was 15 years ago, so mostly spinning rust and machines a bit slower than we're used to today.

That 3 minutes sounds quick, but I'm an impatient soul, and so were my users. Could I do better?

Actually, yes, quite a bit. What's contributing to that 3 minutes? There's a bit of adding files (the /etc and /var filesystems are not shared, for reasons that should be fairly obvious). And you need to copy the packaging metadata. But that's just a few files.

Most of the time was taken up by building the contents file, which simply lists all the installed files and what package they're in. It loops over all the packages, merging all the files in that package into the contents file, which thus grows every time you process a package.

The trick was to persuade it to process the packages in an optimal order. You want to do all the little packages first, so that the contents file stays small as long as possible.

And the way to do that was to recreate the /var/sadm/pkg directory. It was obvious that it was simply reading the directory and processing packages in the order that it found them. And, on ufs, this is the order that the packages were added to the directory. So what I did was move the packages to one side, create an empty /var/sadm/pkg, and move the package directories back in size order (which you can get fairly easily by looking as the size of the spooled pkgmap files).

This doesn't quite mean that the packages get processed in size order, as it does the install in dependency order, but as long as dependencies are specified it otherwise does them in size order.

The results were quite dramatic - with no other changes, this took zone install times from the original 3 minutes to 1 minute. Much happier administrators and users.

This trick doesn't work at all on zfs, sadly, because zfs doesn't simply create a linear list of directory entries and put new ones on the end.

And all this is irrelevant for anything using IPS packaging, which doesn't do sparse-root zones anyway, and is a completely different implementation.

And even in Tribblix, which does have sparse-root zones like Solaris 10 did, and uses SVR4 packaging, the implementation is orders of magnitude quicker because I just create the contents file in a single pass, so a sparse zone in Tribblix can install in a second or so.

Remnants of closed code in illumos The Trouble with Tribbles...

One of the annoying issues with illumos has been the presence of a body of closed binaries - things that, for some reason or other, were never able to be open sourced as part of OpenSolaris.

Generally the illumos project has had some success in replacing the closed pieces, but what's left isn't entirely zero.It took me a little while to work out what's still left, but as of today the list is:

etc/security/tsol/label_encodings.gfi.single
etc/security/tsol/label_encodings.example
etc/security/tsol/label_encodings.gfi.multi
etc/security/tsol/label_encodings
etc/security/tsol/label_encodings.multi
etc/security/tsol/label_encodings.single
usr/sbin/chk_encodings
usr/xpg4/bin/more
usr/lib/raidcfg/mpt.so.1
usr/lib/raidcfg/amd64/mpt.so.1
usr/lib/iconv/646da.8859.t
usr/lib/iconv/8859.646it.t
usr/lib/iconv/8859.646es.t
usr/lib/iconv/8859.646fr.t
usr/lib/iconv/646en.8859.t
usr/lib/iconv/646de.8859.t
usr/lib/iconv/646it.8859.t
usr/lib/iconv/8859.646en.t
usr/lib/iconv/8859.646de.t
usr/lib/iconv/iconv_data
usr/lib/iconv/646fr.8859.t
usr/lib/iconv/8859.646da.t
usr/lib/iconv/646sv.8859.t
usr/lib/iconv/8859.646.t
usr/lib/iconv/646es.8859.t
usr/lib/iconv/8859.646sv.t
usr/lib/fwflash/verify/ses-SUN.so
usr/lib/fwflash/verify/sgen-SUN.so
usr/lib/fwflash/verify/sgen-LSILOGIC.so
usr/lib/fwflash/verify/ses-LSILOGIC.so
usr/lib/labeld
usr/lib/locale/POSIX
usr/lib/inet/certlocal
usr/lib/inet/certrldb
usr/lib/inet/amd64/in.iked
usr/lib/inet/certdb
usr/lib/mdb/kvm/amd64/mpt.so
usr/lib/libike.so.1
usr/lib/amd64/libike.so.1
usr/bin/pax
platform/i86pc/kernel/cpu/amd64/cpu_ms.GenuineIntel.6.46
platform/i86pc/kernel/cpu/amd64/cpu_ms.GenuineIntel.6.47
lib/svc/manifest/network/ipsec/ike.xml
kernel/kmdb/amd64/mpt
kernel/misc/scsi_vhci/amd64/scsi_vhci_f_asym_lsi
kernel/misc/scsi_vhci/amd64/scsi_vhci_f_asym_emc
kernel/misc/scsi_vhci/amd64/scsi_vhci_f_sym_emc
kernel/strmod/amd64/sdpib
kernel/drv/amd64/adpu320
kernel/drv/amd64/atiatom
kernel/drv/amd64/usbser_edge
kernel/drv/amd64/sdpib
kernel/drv/amd64/bcm_sata
kernel/drv/amd64/glm
kernel/drv/amd64/intel_nhmex
kernel/drv/amd64/lsimega
kernel/drv/amd64/marvell88sx
kernel/drv/amd64/ixgb
kernel/drv/amd64/acpi_toshiba
kernel/drv/amd64/mpt
kernel/drv/adpu320.conf
kernel/drv/usbser_edge.conf
kernel/drv/mpt.conf
kernel/drv/intel_nhmex.conf
kernel/drv/sdpib.conf
kernel/drv/lsimega.conf
kernel/drv/glm.conf

Actually, this isn't much. In terms of categories:

Trusted, which includes those label_encodings, and labeld. Seriously, nobody can realistically run trusted on illumos (I have, it's ... interesting). So these don't really matter.

The iconv files actually go with the closed iconv binary, which we replaced ages ago, and our copy doesn't and can't use those files. We should simply drop those (they will be removed in Tribblix next time around).

There's a set of files connected to IKE and IPSec. We should replace those, although I suspect that modern alternatives for remote access will start to obsolete all this over time.

The scsi_vhci files are to get multipathing correctly set up on some legacy SAN systems. If you have to use such a SAN, then you need them. If not, then you're in the clear.

There are a number of drivers. These are mostly somewhat aged. The sdp stuff is being removed anyway as part of IPD29, so that'll soon be gone. Chances are that very few people will need most of these drivers, although mpt was fairly widely used (there was an open mpt replacement in the works). Eventually the need for the drivers will dwindle to zero as systems with them in no longer exist (and, by the same token, we wouldn't need them for something like an aarch64 port).

Which just leaves 2 commands.

Realistically, the XPG4 more could be replaced by less. The standard was based on the behaviour of less, after all. I'm tempted to simply delete /usr/xpg4/bin/more and make it a link to less and have done with it.

As for pax, it's required by POSIX, but to be honest I've never used it, haven't seen anywhere that uses it, and read support is already present in things like libarchive and gtar. The heirloom pax is probably more than good enough.

In summary, illumos isn't quite fully open source, but it's pretty close and for almost all cases we could put together a fully functional open subset that'll work just fine.

Static Site Generators The Trouble with Tribbles...

The current Tribblix website is a bit of a hack. Technically it's using a static site generator - a simple home-grown script that constructs pages from a bit of content and boilerplate - but I wanted to be able to go a bit further.

I looked at a few options - and there are really a huge number of them - such as Hugo and Zola. (Both are packaged for Tribblix now, by the way.)

In the end I settled on nanoc. That's packaged too (and I finally got around to having a very simple - rather naive - way of packaging gems).

Why nanoc, though? In this case it was really because it could take the html page fragments I already had and create the site from those, and after tweaking it slightly I end up with exactly the same html output as before.

Other options might be better if I was starting from scratch, but it would have been much harder to retain the fidelity of the existing site.

One advantage of the new system is that I can put the site under proper source control, so the repo is here.

There's still a lot of work to be done on filling out the content, but it should be easier to evolve the Tribblix website in future.

Building Big Systems with Remote Hardware Teams Oxide Computer Company Blog

The product we’re building, a rack-scale computer, is specifically designed to be a centralized, integrated product because that’s what our customers need. This requirement and the design choices we’ve made to meet this need create some daily efficiency challenges for our team. As a remote-first company, we’re designing this product with team members (including the hardware team) across most North American time zones and even multiple continents, so a large portion of our team is not going into the office/lab every day for hands-on access to "production" hardware. At first blush, the design of our product and the design of our team appear to conflict at some level: we value remote work, but we can’t ship entire racks to the homes of our teammates for both practical and economic reasons.

Our racks are rather inconvenient for a home installation: over 2.3 m (7.7') tall, very heavy, and have 3-phase power inputs that aren’t usable in a typical residential setting. Aside from the logistical challenges of a home installation, there’s also the actual cost: these are expensive, and outfitting each remote team member with a full, or even partially populated, rack is economically infeasible. Further, a racked target is not terribly useful for development, as accessing them for debugging is challenging: we have no externally accessible debugging interfaces or other things that can be repurposed as such because our customers don’t want that stuff! We can (and do!) travel some to get hands-on with a full system, but it became clear early on in the development cycle that we needed more convenient ways of being productive remotely.

Remote productivity on this design is a multi-faceted problem and the solution includes robust remote access to fully-built and partially built systems back at HQ, but that alone does not address all the needs.

This post will deal more with the philosophy we have developed around our non-product board designs as we’ve learned what works for us on our journey through remote development. Some of these tools have become pivotal in increasing our remote efficiency, especially early in the design cycle when the "real" systems weren’t very functional and remote accessibility was limited. For more board-by-board specifics, check out a great Oxide and Friends episode where we talked through the genesis of many of these designs. With many of our team members who designed these boards on-hand, it was a great discussion and a lot of fun talking about the role prototypes have played in facilitating our actual product design.

Not a distraction from "real" product design

We fully believe that these small designs, most of which end up taking around a week of engineering time to design, have radically accelerated or augmented our "real" product designs. I detail a couple of specific examples of how this prototype hardware helped us, from enabling software work before any "real" hardware existed, to prototyping circuits like our multiplexed QSPI flash design. Specifically for the QSPI design, the initial circuit just did not work like we expected and using these boards we were able to quickly (and inexpensively!) iterate on the design, directly informing the work on our "real" designs, and in this case, likely saving a spin of our production hardware that wouldn’t have worked. We were even able to connect our SPI mux to extant development hardware from AMD and validate our assumptions before building a server sled. The Oxide and Friends episode mentioned above covers some of these and other stories in more detail.

Our team fully embraces toolmaking up and down the stack: it informs many of our design choices and directions. Bryan recently gave a talk on the concept, and this is yet another application of it. Just like software teams build tools to help build software, we’re building hardware tools to help build hardware and software.

To emphasize how pervasive this culture is in our company, Dan made a great point during the Oxide and Friends chat:

Anyone in the company is empowered to do this.

We don’t need approval or sign-off, we just go do what’s right for Oxide, and I think this quote from Aaron really sums up our team’s viewpoint:

Investments in tools pay off long-term and often faster than you’d think!

We’ve seen time and time again the effort put into these small boards has paid back big dividends in team productivity, development ease, and bug chasing.

Why we needed custom hardware vs off-the-shelf development boards

There are multiple aspects to our need for custom hardware. First, the custom designs supplement our use of off-the-shelf (OTS) hardware. We use many off-the-shelf development boards and even provide support for a number of these boards in Hubris. These are great for many use-cases, but less great when we are trying to model specific circuits or subsystems of our product designs. Second, we have numerous examples of custom boards that were built simply because we could find no useful OTS alternative: boards like the Dimmlet (I2C access to DDR4 SPD EEPROMs) and the K.2 (U.2 → PCIEx4 CEM breakout) fall into this category.

Narrow PMOD-interface board for interfacing with the SPD EEPROMs on the two installed DDR4 DIMMs
Figure 1. Narrow PMOD-interface board for interfacing with the SPD EEPROMs on the two installed DDR4 DIMMs
PCIe U.2 connector to PCIe x4 CEM connector extender board
Figure 2. PCIe U.2 connector to PCIe x4 CEM connector extender board

Thriftiness in practice

While this strategy of developing prototypes touches on many Oxide values (as discussed below), Thriftiness deserves special attention. Making inexpensive hardware has never been easier! Quick-turn PCB houses, both offshore and onshore, have achieved incredibly low cost while maintaining high quality. We had 50 K.2r2 PCBs with impedance control and a framed stencil fabricated for <$400USD. For something so simple (BOM count <10 parts) we built these in-house using an open-source pick and place machine (Lumen PNP from Opulo), and a modified toaster oven with a Controleo3 controller. We’ve also done hot-plate reflow and hand assembly. And while we will outsource assembly when it makes sense due to complexity or volume, for these simple, low volume designs, we see real benefits in self-building: we can build as few or as many as we want, do immediate bring-up and feed any rework directly into the next batch, and there’s no overhead in working with a supplier to get kitted parts there, quotes, questions etc. A good example of this was on the Dimmlet: I messed up the I2C level translator circuit by missing the chip’s requirements about which side was connected to the higher voltage. Since I was hand-assembling these, I built one, debugged it, and figured out the rework required to make it function. Since this rework included flipping the translator and cutting some traces, catching this issue on the first unit made reworking the remaining ones before going through assembly much easier.

All of that to say, the cost of building small boards is really low. A single prototype run that saves a "real" board re-spin pays for itself immediately. Enabling software development before "real" hardware lands pays for itself immediately. Even when things don’t work out, the cost of failure is low; we lost a few hundred dollars and some engineering time, but learned something in the process.

Because of this low cost, we can use a "looser" design process, with fewer tollgates and a less formal review/approval process. This lowers the engineering overhead required to execute these designs. We can have more informal reviews, a light-weight (if any) specification and allow design iteration to happen naturally. Looking at the designs, we have multiple examples of design refinement like the K.2r2 which improved on the electrical and mechanical aspects of the original K.2, and a refinement to the sled’s bench power connector boards resulting in a more compact and ergonomic design that improves mating with sleds in their production sheet metal.

Experience and the evolution of our strategy

Early in our company’s history, the team built out a development platform, named the Gemini Bring-up board, representing the core of the embedded design for our product-- our Gemini complex (Service Processor + Root of Trust
Management Network plane). The resulting platform was a very nice development tool, with hilarious silkscreen and some awesome ideas that have continued informing current and future designs, but we rapidly outgrew this first design. While the major choices held, such as which microcontrollers are present, the still-nebulous design of the actual product, and subsequent design iteration, left the periphery of the bring-up board bearing little resemblance to the final Gemini complex design. The changes came from a variety of unforeseen sources: the global chip shortage forced BOM changes and a better understanding of the constraints/needs of our product necessitated architecture changes, resulting in further drift between this platform and what we intended to implement in the product.

First custom Oxide hardware with SP
Figure 3. First custom Oxide hardware with SP and RoT

A re-imagining of what would be most useful gave way to the Gimletlet, a major work-horse for in-house development, designed (and initially hot-plate reflowed) by Cliff. The Gimletlet is essentially a development board using the STM32H7 part that we’re using as our Service Processor (SP) in our product. It provides power and basic board functionality including a couple of LEDs, and a dedicated connector for a network breakout card, and breaks out most of the remaining I/O to PMOD-compatible headers. This choice has been key to enabling a very modular approach to prototyping, recognizing that less is more when it comes to platforms. The Gimletlet design means that we can build purpose-built interface cards without needing to worry about network connectivity or processor support, simplifying the design of the interface cards and able to share a core board support package.

Custom STM32H7 board with I/O breakout to many PMOD interfaces
Figure 4. Custom STM32H7 board with I/O breakout to many PMOD interfaces

Our team has learned that modularity is key to making these small proto-boards successful. It does mean workspaces can get a little messy with a bunch of boards connected together, but we’ve found this to be a good balance, allowing our team members to cobble together a small, purpose-built system that meets their specific needs, and allows us to easily share these common, low-cost setups to our distributed team. The modularity also means that storing them is relatively easy as they can be broken down and stashed in small boxes.

Gimletlet with Igntionlet
Figure 5. Gimletlet with Igntionlet, SPI MUx, Dimmlet, RoTCarrierCarrier, and RoTCarrier connected

Our values tie-ins

There are some obvious values tie-ins like teamwork and thriftiness as already mentioned, but as I started writing this section I realized we hit more of our values than I had even realized. Rather than attempt to enumerate each one, I wanted to hit on some maybe less-obvious ones:

  • Humor: The silkscreens on our boards contain jokes, word-play and other silliness because we enjoy making our co-workers laugh and want our work to be fun too. The funny silkscreen is often snuck in post-review, and thus a surprise to co-workers as they open the finished hardware. Engineering demands creativity — I’ve worked at places where silliness baked into a board would be frowned upon, but at Oxide it is supported and even encouraged! This enables team members to bake a little bit of their personality into these designs, while allowing the rest of the team to have fun as it’s discovered.

Gemini Bring up board with silkscreen riffing on Root of Trust vs Route of Trust vs Newt of Trust as well as pointing out the untrustworthiness of vendor boot ROMs
Figure 6. Preview of Gemini Bring up board
  • Rigor/Urgency: We often find Rigor and Urgency in tension with each other, but in this case, they are complementary. The time from concept to ordering of a PCB on some of these designs is measured in hours or days, not weeks. Being able to move quickly from a paper concept to a physical manifestation of that concept in real hardware has been instrumental in grounding our assumptions and informing our designs. We’re able to quickly iterate in areas where we have questions, driving resolution without holding up the rest of the design. This work directly contributes to better architecture and circuit design decisions in our "real" designs.

  • Transparency/Responsibility/Teamwork: We believe in openness, including our own designs, so we’re opening up the various proto-board design repositories for reference and hope that something there is useful in your own hardware endeavors. These designs are things that we wished existed and so created them, some of these may be a bit specific for our use-cases, but there are some generally useful things there too. These are mostly KiCAD designs and support for them is "as-is" since we’re focused on getting our product out the door, but feel free to reach out in the repo with questions and we’ll attempt to answer on a best-effort basis.

A Tool for Discussion Oxide Computer Company Blog

At Oxide, RFDs (Requests for Discussion) play a crucial role in driving our architectural and design decisions. They document the processes, APIs, and tools that we use. The workflow for the RFD process is based upon those of the Golang proposal process, Joyent RFD process, Rust RFC (Request for Comments) process, and Kubernetes proposal process. To learn more about RFDs and their process, you can read this post.

Similar to RFCs, our philosophy of RFDs is to allow both timely discussion of rough ideas, while still becoming a permanent repository for more established ones.

Oxide RFDs are essentially a collection of AsciiDoc documents, collected in a GitHub repo. They can be quickly iterated on in a branch, discussed actively as part of a pull request to be merged, or commented upon after having been published.

Whilst a repo is a useful storage and collaboration tool, there are a number of drawbacks: it doesn’t provide the best reading experience, is limited in terms of AsciiDoc support, and is challenging to share externally. To address these issues we developed an internal RFD site. This post serves as showcase for that site and gives a brief look at some of its features.

RFD directory

Users land directly on the directory. By default it is sorted by last updated to give the user an idea of the RFDs that are actively being worked on.

RFD site homepage

Full-text search

Full-text search is powered by a self-hosted Meilisearch instance. The search index is automatically updated whenever an RFD is edited. Users can access the search function through the navigation menu or by using the hotkey CMD+K and can quickly navigate through the search results using their keyboard whilst previewing the associated RFD.

Search dialog showing results and a preview of the matched RFD

Inline PR discussion

The discussion surrounding an RFD is crucial to understanding its context, but until recently users would have to open the associated pull request in a separate tab to view its comments. To improve this experience, we’ve implemented a feature that uses the GitHub API to fetch the pull request discussion and display the comments that are still actively attached to a line alongside the part of the document they relate to.

Pop-over menu with comments that relate to the part of the document they are next to
Figure 1. Inline comments

We achieve this by using the getLineNumber function in asciidoctor.js, which allows us to map the raw line number of the comment (from the GitHub API) to the nearest block in the rendered document. While this method may not pinpoint the exact line, it is usually accurate enough.

To avoid slowing down page load times, we use the Remix deferred response feature to stream in the comments asynchronously, holding only for the critical RFD content to finish loading.

                    const rfd = await fetchRfd(num, user)
  if (!rfd) throw resp404()

  // this must not be awaited, it is being deferred
  const discussionPromise = fetchDiscussion(rfd.discussion_link, user)

  return defer({ rfd, discussionPromise })

Users can access the full discussion of an RFD at any time by opening a dialog regardless of their current location on the page. This dialog also provides the ability to jump directly to the line that is being commented on.

Dialog with a GitHub PR style timeline showing comments and snippets of the raw document
Figure 2. Full discussion

Inter-RFD linking

When an RFD document references another RFD within its content, the user can hover over the reference to see a preview of the title, authors, status, and the date of the last update. This makes it easy for users to understand the context and relationship between different RFDs, and quickly access the related documents.

Pop-over that previews the linked RFD

Jump-to menu

For users who know the title or number of the RFD they want to view, a menu can be opened by pressing CMD+/ from any page on the site. This menu allows users to quickly filter and select the desired RFD using their keyboard.

Navigation modal that shows a list of RFDs being filtered by an input

Upcoming

The internal tooling around RFDs is always improving, and as it does, we hope that the way we collaborate will also improve. There is still work to be done in terms of discoverability of documentation, such as adding more tools to filter and tag RFDs, and creating collections of documents. This will make it easier for new employees to get up to speed, and make it easier to manage the challenges that come with a growing team and an increasing amount of documentation. Having a better understanding of the whole stack is valuable, as it allows us to better understand the impact of our work on other parts of the product.

Additionally, we need to consider how we can make this process more accessible to everyone. Writing an RFD currently requires cloning a Git repo, running a script, committing, pushing, and making a PR. We are thinking about how to do this all through the web app with an embedded text editor. Oxide is and will continue to be an engineering-led organization, but RFDs are not just for engineers. Making it easier for everyone to create and collaborate on these documents will result in richer conversations.

Navigating Today’s Supply Chain Challenges Oxide Computer Company Blog

We’ve all experienced it. From toilet paper, exercise equipment, toys, cars, and everything in between, the supply chain during COVID has been blamed for many consumer goods shortages, and rightfully so. During lockdown, how many of us stalked our local warehouse clubs for that elusive delivery of toilet paper, scared of the implications if none was found? Or maybe you tried negotiating the price on eBay for a set of weights that was 3-4x the usual cost? Those shortages seen by the average consumer also heavily plagued electronics manufacturers and their customers as well.

Now imagine being a start-up during COVID. A start-up in the electronics industry. A start-up in competition for those highly demanded, severely constrained electronic components. A start-up with no name recognition, no history, and no relationships with manufacturers and / or distributors. Seemingly simple items like capacitors and resistors saw 20+ week lead times, with other parts advertising lead times of 52, 98, and even 104+ weeks. That’s what the supply chain looked like for Oxide in 2021, and in many component categories, still looks like today.

Our Operations Team has been hard at work since late 2020 trying to secure supply for a product that, throughout 2021 and into 2022, has continued to undergo design changes. The procurement function became a delicate balancing act taking into account lead time, cost, industry outlooks, and working closely with our engineering team regarding upcoming design changes. How much faith could we put in the demand for a given part today, when we knew an updated Bill of Materials (BOM) would be published in a few weeks? For parts with lead times that would extend past our first customer ship date, how much supply should we purchase 12-18 months ahead of schedule, knowing our design was not finalized? Working with borrowed money (literally, from our investors), we needed to quickly put in place a robust procurement system to balance the issues we faced. We needed an actionable plan that solved supply issues on many fronts. So, we did what the average consumer did during COVID; we stalked the stores (in our case, online distributors) day and night, weekdays and weekends waiting for restocks. We negotiated with suppliers, investigating whether there was additional inventory available but being held back. In some cases, being a start-up and only needing small quantities was helpful. We were able to get sample quantities of parts that would last us through our engineering build cycle. However, being a small start-up also meant we were up against the big guys. Getting recognition of our existence, let alone inventory allocated to us, was often a stressful, tedious process. What have we found that helped our team the most? Strategic supplier relationships.

There is not enough that can be said for setting up strong, trusting relationships with your suppliers. All of us on the Operations Team at Oxide have extensive backgrounds at some of the world’s top manufacturing and supply chain companies. We’ve seen and heard it all. We know the lengths many procurement professionals will go to in order to secure supply during allocations. They may inflate demand knowing their allocation quantity is based on their demand, or they may communicate required dates that are several weeks or months ahead of when the supply is actually needed. However, those responsible for the supply allocations usually realize this. When the truth comes to light, that company’s future demand is often taken with a grain of salt. It becomes difficult to trust that company again. That’s where the Oxide Ops Team differentiate ourselves. Oxide’s principles and values truly drive our everyday work. We’re firmly committed to our principles of Integrity, Honesty, and Decency, and integrate them into all of our business practices. We strive to balance our sometimes conflicting values in order to strengthen our vendor relationships. Here are a few of the Oxide values and how we showcase them in our supply chain relationships.

  • Candor — We’re upfront about our needs, including quantities and dates. As these items change, we do our best to proactively communicate those changes to our suppliers. We know there will be times we need our suppliers to jump through hoops for us, to expedite, to get supply not otherwise allocated to us. However, we also understand these should be one-off instances and not the norm. We want our supplier reps to succeed just as much as we strive for success at Oxide. Being candid helps ensure we are all set up for success.

  • Rigor — We demand a lot from ourselves, but we also demand a lot from our suppliers. If a date slips or the component quality isn’t as expected, we request our supplier proactively communicate, root cause, and implement corrective action as needed. We have a small team and we rely on our suppliers to have the same sense of rigor as we do at Oxide.

  • Teamwork — We look at the relationships with our suppliers as extended team members. We want to instill in them a sense of pride for the Oxide product, just as much as we support them and their company. Successes and failures are shared amongst everyone involved. We will not be successful at Oxide without our extended team.

  • Thriftiness – We’re a start-up with maniacally focused founders, a very involved board of directors, and limited capital. We’re building a massive product while being very cognizant of costs. Given our small size and start-up status, we rely on our relationships with our suppliers, coupled with massive amounts of internet searching and price comparing, to try and get the best costs we can. We know we’re paying more for items than the big guys, but we’re trying to close that gap as much as we can. Getting our suppliers on board with our vision, and getting them excited about the Oxide rack, is instrumental in price negotiations. We’re also sure that one day we’ll be one of those big guys. :)

Aligning ourselves with suppliers we can trust and partners invested in the success of Oxide has allowed us to successfully navigate the current supply chain conditions. Treating others with respect and kindness cannot be underestimated. Building strong relationships on an unfaltering basis of Integrity, Honesty and Decency is key. Adhering to our principles and balancing our values will continue to drive our successes.

When Allocation Hits the Fan

Allocation. A term no supply chain professional wants to hear. Even more so when you’re a start-up in a critical test phase of your new product which has yet to launch. Founders, investors, manufacturing partners and others, all anxious for an update. That timing device, tiny in size, but mighty in nature, gating your entire build process. Your product can’t run without it. There are no suitable alternates and there is no supply via your normal channels. There is, however, a large supply showing in the broker market. Over 100K. Wow, score! You may have just solved your high-profile supply constraint. Now the only thing left to decide is which of these companies will get your business.

Before you get too excited though, you need to pause, step back, and analyze the situation. Could there really be over 100K of this highly constrained part on the broker market? That $0.38 part is showing for between $1.10 and $7.08 on the broker market. How could there be that much variation? Is the higher priced part “real” while the lower priced part possibly counterfeit or stolen? What would happen if you were to receive a counterfeit part? The brokers typically all provide some sort of generic “guarantee” on their website. How bad could it be? You could order the part, have it tested, realize it’s not real and get your money back, right? Sounds easy, though it rarely is. From reading reviews of brokers, speaking with people who have had bad experiences, and my own search for broker parts, many of the broker websites are not legitimate. Embrace your inner pessimist and begin your search being wary of everyone. While there are certain countries which automatically throw up red flags, there are also plenty of US based companies I wouldn’t consider doing business with. It’s ok to be pessimistic now and then. You need to protect your company, your product, and your reputation. What sorts of issues could you face with parts purchased from a broker? You could end up with parts that have old date codes, have not been stored correctly in humidity controlled / ESD bags, damaged, or downright counterfeit. Things to look for when evaluating an online broker:

  1. Misspellings & Poor Grammar. Sure, we all make mistakes from time to time. I had to make sure “misspellings” was spelled correctly! However, a professional website should not have misspellings, blatant grammar issues or obviously poor translation.

  2. Authorized Distributor Claims. Make sure to go to the actual manufacturer’s website and look up their authorized distributors. If a broker claims they are authorized and aren’t listed, do not order anything, ever.

  3. 100% Guaranteed Claims – Authenticity, Quality, etc. Read the superfine print. Understand what is being guaranteed. A phrase such as “guaranteed to function like original part” should make you pause given use of the word “like”. Most sites seem to offer some feel-good blurb on how they’ve had the parts tested in-house or by a third-party tester and results are available. What are the qualifications of the tester? What were they testing? Some websites will even tell you they support you having the part tested. If your tests show the part is not original, you can return the part with your test data. However, reading deeper into their return policy, often found in a different section of their website, will reveal that you cannot return any part that has been opened and / or is not the full quantity ordered. Return windows may also be abnormally short, not affording you enough time to test the parts. Unfortunately for you, in order to perform a physical and electrical verification, you must open the packaging and use several for testing purposes. You have now invalidated your ability to return the part.

  4. Payment Terms – Be extremely cognizant of payment terms, types of payment accepted, and the entity you are paying. Research the payee and bank information. There are certain payment methods that are more secure and better for this type of transaction than others.

  5. Do Your Research – Spend some time reading online reviews. Take note of who / where any good reviews are coming from. Pay closer attention to the negative reviews. While it’s true more people are likely to leave a negative review than a positive review, if you see consistency in the negative reviews, it’s probably best to move along. Common negative reviews of brokers include no communication after payment sent, payment sent and part not actually in stock, payment sent and part not received, and parts do not work as expected (old date code, damaged, counterfeit). A quick search can open your eyes to a lot of questionable activities!

After checking out many of the websites showing stock on the part you desperately need, you’re back to square one. You’ve realized you can’t possibly trust one of these unheard-of websites to provide an instrumental component in your product. You’ve spoken with the manufacturer and distributors and nothing is available, not even samples. What next? Give up? Tell your founders and investors the timeline is in jeopardy and all forward progress must come to a halt? None of those sound like great alternatives.

I’m blessed in that I have a history in the broker market from a previous job in the computer industry. I’ve been exposed to all sorts of people and stories, some that make you just shake your head in disbelief. I already have the handful of people I feel comfortable and enjoy working with. These are a few areas I look at when deciding to work with any broker.

  1. Certifications (ISO, OSHA, R2, eStewards, etc) – While these certifications are typically for manufacturers and / or refurbishers, it gives me a sense of comfort in knowing that a company has achieved any of these certifications. Can a certification be “bought”? Sure, though no one involved would admit it. Don’t make this your only deciding factor but do take it into account. Physical site visits can help clarify any outstanding questions.

  2. Length of Time in Business – The good ones survive the times. Some not-so-great ones survive as well. Some new ones have yet to make a name for themselves but may be great options and provide amazing customer service. Or, they may cut corners to try and increase the bottom line. Do your research and talk to respected members of the secondary market community.

  3. Reputation – Not everyone will agree on a binary assignment of “good” or “bad” for a company. Again, speak with people in the industry about the company, its leadership, and its employees. Do your research on how the company started, how they’ve grown and changed over time, employee turnover, focus for the future, etc.

  4. Component Testing – As we’ve seen, many companies say they do component testing to verify legitimacy of a part. Ask questions, ask to see sample reports, inquire where they do their testing (in-house, 3rd party), etc. You may even consider asking if you can be present during the testing. You can learn a lot from how that question is answered, even if you don’t plan on actually being present.

  5. Warranty – Do they offer the same level of warranty as the manufacturer? What is covered by the warranty? How are warranty claims made?

  6. Trust – Trust the people you’re going to be working with and choose people you’re going to enjoy working with. There are a lot of good brokers out there, so you do have options. Find one you sync with, and the relationship will be off to a positive start.

In the end, choosing a broker to work with is both a tactical and personal choice. We don’t often get to choose who we work with, but when presented with an opportunity to do so, make sure you choose wisely. The quality and security of your product depend on it, and oftentimes, so does your sanity.

Benefits as a Reflection of Values Oxide Computer Company Blog

“We offer the best health insurance we could find” is what we promise in our job postings. On paper, this is accurate: the health insurance Oxide offers is the best plan we can find that is offered to small businesses.

What we left unsaid until now is that the best health insurance offered to small businesses is, in fact, not very good at all if you don’t neatly fit into a narrow demographic; the bitter irony is that the US healthcare system isn’t designed for those of us who rely on it the most. And life-saving treatments that aren’t needed by able-bodied cisgender people are, more often than not, deemed “not medically necessary” in off-the-shelf insurance policies simply because there is no law to require their coverage. In our society’s bizarre and accidental system, we rely on employers to provide benefits everyone should have, including healthcare, retirement plans, and dependent care.

When I came out as trans and started seeking medical care, I worked at a large employer that directly paid for the medical costs of its employees and dictated how their insurance networks process claims. I had the benefits I needed because other trans people fought for them and the company could unilaterally choose to provide them. Startups don’t have this luxury and are at the whims of insurance companies to keep the cost of hiring and retaining employees manageable. Meanwhile, insurers put profit over care, and won’t budge on off-the-shelf benefits plans unless Congress forces the issue – yet as I write this, lawmakers across the country are either ignoring us or actively stripping away our right to get the healthcare we need, so we’re not holding our breath.

With this in mind, I was initially uneasy about applying to Oxide because we didn’t make our benefits clear to prospective applicants. But I still applied — not out of blind faith, but because I felt a company built on these values would put its people first and work to provide what I needed. Shortly after I started, despite insurance companies not budging even an inch, our CEO Steve announced a reimbursement arrangement that would cover $10,000 of out-of-pocket healthcare expenses for gender-related healthcare per year. This isn’t perfect, but it’s a damn good start.

Future applicants shouldn’t need to put this much faith in us upholding our values in order to be comfortable about applying, though. Here’s a much clearer summary of our benefits:

  • We offer the best medical, dental, and vision insurance plans we can find as a small employer; premiums are 100% paid by Oxide for both employees and dependents.

  • We offer an optional FSA plan for out-of-pocket healthcare and dependent care expenses.

  • We reimburse (through an HRA) up to $17,000 annually: $10,000 annually for gender affirmation or infertility expenses, $5,000 annually for hearing and laser eye surgery expenses, and $2,000 annually for dental and miscellaneous healthcare expenses. The HRAs cover out-of-pocket expenses regardless of whether insurance covers them partially or not at all.

The bottom line: Where our insurers fall short, we will work to meet our employees where they need us to be.

As with our compensation model, the benefits we provide embody our mission, principles, and values: We can’t focus on our mission if we’re distracted by healthcare expenses. Our principles of integrity, honesty, and decency compel us to care for our teammates and their families. And our approach to benefits intersects with several of our values:

  • It is driven by our empathy. Even if we don’t have the context for someone else’s needs, we don’t need it: none of us ever want to have to worry about healthcare expenses for ourselves, our partners, or our families, and we wouldn’t wish it on each other. We understand that this approach is necessary and important, even if some of us don’t directly benefit.

  • It is a step toward building a more diverse team. In this regard, we are not meeting the same standard to which we hold ourselves with our other values. We strive to change that by embracing the needs of our current team as well as those of prospective future teammates. Benefits are a critical part of what employers bring to the table for candidates, and we don’t want to inhibit people from applying to Oxide because of the perception that small companies can only provide meager coverage that doesn’t work for them. While our compensation model ensures we have no pay gap, our approach ensures that people relying more on these vital benefits are still getting paid the same as their peers. Every member of the team has different needs, and we will do our best to address as many of them as we can.

  • It is a reflection of our resilience. We will do everything within our power to take care of our employees. We will continue to fight our insurers to provide for basic healthcare needs, and when they fall short, we will find other ways to provide for those needs (such as offering HRAs) while we continue to fight. We will never stop advocating for our employees.

  • It is a fundamental responsibility to our teammates. We don’t treat healthcare benefits as a “perk”: it is a basic need for all of us. And while it is tiring to continue to fight against an uncaring and unwavering healthcare system only to make incremental progress, we know that we must do so to keep our mission in sight.

Finally, in the spirit of transparency: we’ve made our benefits information public starting today. These are close to the same documents employees see when signing up for benefits (with some information only relevant for employees removed). We’re working to provide additional information for specific situations that we’re aware of — and we know that our benefits are not complete; there are innumerable needs none of us have experienced or thought of.

We share our benefits information for two important reasons: first, we want applicants to be able to learn everything we know about our benefits so that they feel confident in applying even in the face of a healthcare need that is not commonly covered; second, we want to give employees at other similarly-situated companies tools (such as suggesting HRAs) to help fight for the healthcare coverage they need. We have a responsibility to take care of our employees, but we also have a responsibility to make our industry a better place to work for everyone.

Another vulnerability in the LPC55S69 ROM Oxide Computer Company Blog

Here at Oxide, we continue to work on building servers as they should be. Last year, we discovered an undocumented hardware block in the LPC55S69 (our chosen part for our product’s Root of Trust implementation) that could be used to violate security boundaries. This issue highlighted the importance of transparency as an Oxide value which is why we are bringing another recently discovered vulnerability to light today. While continuing to develop our product, we discovered a buffer overflow in the ROM of the LPC55S69. This issue exists in the In-System Programming (ISP) code for the signed update mechanism which lives in ROM. This vulnerability allows an attacker to gain non-persistent code execution with a carefully crafted update regardless of whether the update is signed. This can be used to circumvent restrictions when the chip is fully locked down and also extract the device’s DICE Unique Device Secret (UDS). Because this issue exists in ROM there is no known workaround other than disabling all hardware and software paths to enter ISP mode. CVE-2022-22819 has been assigned for this vulnerability. Finding two separate issues in the same chip only strengthens Oxide’s assertion that keeping code proprietary does not improve product security and hardware manufacturers such as NXP should make their ROM source available for customer review.

Updates are hard

Before discussing the exploit, it’s worth thinking about the higher level problem: how do you update your software on a microcontroller once it leaves the factory? This turns out to be a tricky problem where a bug can result in a non-functional device. To make this problem easier, chip makers like NXP will provide some method to put the chip in a mode that allows for safe modification of flash independent of installed firmware. NXP offers this via its In System Programming (ISP) mode.

ISP mode allows a host (typically a general purpose computer) to read and write various parts of the chip including flash by sending commands to the target over a variety of protocols. The LPC55S69 supports receiving ISP commands over UART, SPI, I2C, and, on variants that include the necessary peripheral, CAN. The LPC55S69 can be configured to require code be signed with a specific key. In this configuration, most commands are restricted and changes to the flash can only come via the receive-sb-file command.

The update format

The receive-sb-file ISP command uses the SB2 format. This format includes a header followed by a series of commands which can modify the flash or start code execution. Confidentiality and integrity of an update are provided by encrypting the commands with a key programmed at manufacturing time, inserting a secure digest of the commands in the update header, and finally signing the header. The C representation of the first part of the header looks like the following:

              struct sb2_header_t {
        uint32_t nonce[4];

        uint32_t reserved;
        uint8_t m_signature[4];
        uint8_t m_majorVersion;
        uint8_t m_minorVersion;

        uint16_t m_flags;
        uint32_t m_imageBlocks;
        uint32_t m_firstBootTagBlock;
        section_id_t m_firstBootableSectionID;

        uint32_t m_offsetToCertificateBlockInBytes;

        uint16_t m_headerBlocks;

        uint16_t m_keyBlobBlock;
        uint16_t m_keyBlobBlockCount;
        uint16_t m_maxSectionMacCount;
        uint8_t m_signature2[4];

        uint64_t m_timestamp;
        version_t m_productVersion;
        version_t m_componentVersion;
        uint32_t m_buildNumber;
        uint8_t m_padding1[4];
};

The bug

The SB2 update is parsed sequentially in 16-byte blocks. The header identifies some parts of the update by block number (e.g. block 0 is at byte offset 0, block 1 at byte offset 16 etc). The bug comes from improper bounds checking on the block numbers. The SB2 parser in ROM copies the header to a global buffer before checking the signature. Instead of stopping when the size of the header has been copied (a total of 8 blocks or 128 bytes), the parsing code copies up to m_keyBlobBlock number of blocks. In a correctly formatted header, m_keyBlobBlock will refer to the block number right after the header, but the code does not check the bounds on this. If m_keyBlobBlock is set to a much larger number the code will continue copying bytes beyond the end of the global buffer, a classic buffer overflow.

Impact

The full extent of this bug depends on system configuration with code execution possible in many circumstances. A simple version of this can allow for enabling SWD access (normally disabled during ISP mode) via jumping to existing code in ROM. A more sophisticated attack has been demonstrated as a proof-of-concept to provide arbitrary code execution. While code execution via this vulnerability does not directly provide persistence, attack code executes with the privileges of ISP mode and can thus modify flash contents. If this system is configured for secure boot and sealed via the Customer Manufacturing Programming Area (CMPA), modifications of code stored in flash will be detected on subsequent boots. Additionally, ISP mode executes while the DICE UDS (Unique Device Secret) is still accessible allowing for off-device derivation of keys based on the secret.

Mitigation

Because this is an issue in the ROM, the best mitigation without replacing the chip is to prevent access to the vulnerable SB2 parser. Disabling ISP mode and not using flash recovery mode will avoid exposure, although this does mean the chip user must come up with alternate designs for those use cases.

The NXP ROM also provides an API for applying an SB2 update directly from user code. Using this API in any form will still provide a potential path to expose the bug. Checking the signature on an update using another ROM API before calling the update API would provide verification than an update is from a trusted source. This is not the same thing as verifying that the update data is correct or not malicious. Signature verification does provide a potential mechanism for some degree of confidence if using the SB2 update mechanism cannot be avoided.

Conclusion

As exciting as it was to find this issue, it was also surprising given NXP’s previous statement that the ROM had been reviewed for vulnerabilities. While no review is guaranteed to find every issue, this issue once again highlights that a single report is no substitute for transparency. Oxide continues to assert that open firmware is necessary for building a more secure system. Transparency in what we are building and how we are building it will allow our customers to make a fully informed choice about what they are buying and how their system will work. We, once again, invite everyone to join us in making open firmware the industry baseline.

Timeline

2021-12-22

Oxide discovers vulnerability while attempting to understand SB2 update process

2021-12-23

Oxide discloses vulnerability to NXP

2022-01-03

NXP PSIRT acknowledges the report

2022-01-04

NXP PSIRT acknowledges the vulnerability

2022-02-28

NXP Discloses issues in a NXP Security Bulletin (NDA required) and confirms that a new ROM revision, and thus new part revisions, are required to correct the vulnerability in affected product lines.

2022-03-23

Oxide discloses as CVE-2022-22819

Hubris and Humility Oxide Computer Company Blog

When we started Oxide, we knew we were going to take a fresh look at the entire system. We knew, for example, that we wanted to have a true hardware root of trust and that we wanted to revisit the traditional BMC. We knew, too, that we would have our own system software on each of these embedded systems, and assumed that we would use an existing open source operating system as a foundation. However, as we waded deeper into these embedded systems and especially their constraints of security and robustness we found that what we wanted out of the existing operating systems wasn’t necessarily what they offered (or even wanted to offer).

As time went on in early 2020 and we found ourselves increasingly forcing existing systems out of the comfort of their design centers, we wondered: was our assumption of using an existing system wrong? Should we in fact be exploring our own de novo operating system? In particular, our colleague Cliff Biffle, who had a ton of experience with both Rust and embedded systems, had a vision for what such a system might look like (namely, the system that he had always wanted for himself!). Cliff dove into a sketch of his ideas, giving the nascent system a name that felt perfectly apt: Hubris.

After just a few weeks, Cliff’s ideas were taking concrete shape, and it was clear that there was a lot to like: the emerging Hubris was an all-Rust system that was not distracting itself by accommodating other runtimes; it was microkernel-based, allowing for safety and isolation; it employed a strictly synchronous task model, allowing for it be easily developed and comprehended; and it was small and light, allowing it to fit into some of the tight spots we envisioned for it. But for me, it was the way Cliff thought about the building of the system that really set Hubris apart: instead of having an operating system that knows how to dynamically create tasks at run-time (itself a hallmark of multiprogrammed, general purpose systems), Cliff had designed Hubris to fully specify the tasks for a particular application at build time, with the build system then combining the kernel with the selected tasks to yield a single (attestable!) image.

This is the best of both worlds: it is at once dynamic and general purpose with respect to what the system can run, but also entirely static in terms of the binary payload of a particular application — and broadly static in terms of its execution. Dynamic resource exhaustion is the root of many problems in embedded systems; having the system know a priori all of the tasks that it will ever see liberates it from not just a major source of dynamic allocation, but also from the concomitant failure modes. For example, in Hubris, tasks can always be safely restarted, because we know that the resources associated with a task are available if that task itself has faulted! And this eliminates failure modes in which dynamic task creation in response to load induces resource exhaustion; as Cliff has quipped, it is hard to have a fork bomb when the system lacks fork itself!

Precedence for the Hubris approach can be found in other systems like library operating systems, but there is an essential difference: Hubris is a memory-protected system, with tasks, the kernel, and drivers all in disjoint protection domains. (And yes, even in memory safe languages like Rust, memory protection is essential!) In this regard, Hubris represents what we like about Rust, too: a creative solution that cuts through a false dichotomy to yield a system that is at once nimble and rigorous.

It was clear that Cliff was on the right track with Hubris, and the rest of us jumped in with gusto. For my own part, with debugging and debuggability deep in my marrow, I set to work writing the debugger that I felt that we would need — and that Hubris deserved. Following Cliff’s lead, I dubbed it Humility, and it’s been exciting for me to see the debugger and the operating system work together to yield a higher-quality system.

We have known for quite some time that we would open source Hubris: not only is open source core to our own commercial thesis at Oxide, we also believe that the open source revolution — and its many advantages for customers — are long overdue in the lowest layers of the software stack. So we were waiting for the right occasion, and the Open Source Firmware Conference afforded us an excellent one: if you are a listener of our On the Metal podcast, you heard us talk about OSFC a bunch, and it felt entirely fitting that we would kickoff our own open source firmware contribution there. And while the conference starts today, the good news is that you haven’t missed anything! Or at least, not yet: the conference is virtual, so if you want to hear Cliff talk about Hubris in his own words — and it’s before 12:10 Pacific today — it’s not too late to buy a ticket! (The recording will naturally be released after the conference.)

And of course, if you just want to play with the system itself, Hubris and Humility are both now open source! Start by checking out the Hubris repo, the Hubris docs, and the Humility repo. (And if you are looking for a hardware vehicle for your exploration, take a look at the ST Nucleo-H753ZI evaluation board — a pretty capable computer for less than thirty bucks!) We believe Rust to be a part of the coming firmware revolution, and it’s exciting for us to have a system out there that embodies that belief!

March 2020 Update Oxide Computer Company Blog

Hello friends!

I want to start by saying we wish you the very best during this unprecedented time in which we are all united. Our thoughts go out to everyone working hard to help those in need. We wish you and your families health and resilience.

Hard at work…​ and growing

A lot has happened at Oxide since we first de-cloaked in December and I apologize for the lack of an official update on our end, other than our Twitter feeds. We’ve been hard at work building a product!

We are now a team of 15 people! Everyone was in Emeryville for the Open Compute Summit, which was cancelled, but we still made the best of it by having company-wide face-to-face architecture discussions. We even snapped a photo with the whole team. As Bryan said: we cannot wait for this image to be burned into a ROM.

team
Figure 1. Team

Computer History Museum

We also made sure to visit the Computer History Museum while everyone was in town. It was fun to have some folks from the open firmware community join us as well! Since it was not busy we got to spend an unusually long amount of time with the docent at the IBM 1401 demo which was fascinating.

computer history museum
Figure 2. Computer history museum

On the Metal

Last month, we wrapped up the first season of our podcast, On the Metal. These were super fun to record and I know we are looking forward to Season 2 just as much as you are!

PCI

On February 18th, we received the most perfect PCI vendor ID: 01DE. Huge thanks to Robert Mustacchi for getting that!

The Soul of a New Machine at Stanford

Bryan gave a talk at Stanford on The Soul of a New Machine.

In the media

Tom Krazit at Protocol published a feature on what we are working on: This little server startup wants to take on a horde of tech giants.

Moving in

Cliff L. Biffle made us new signs for our conference rooms!

office signs

We have gotten some amazing mugs from folks for our collection, thank you all so much!

The phone lines are now open!

Last Monday, I opened our phone line to anyone who wanted to share stories about their hardware pain. Thank you to everyone for the wonderful conversations!

oxide.computer v2

On Wednesday, Jared Volpe shipped the redesign of this website! Pro tip: check out the 404 page ;)

Bryan at Oxidize 1K

On Friday, Bryan gave a talk on at Oxidize 1K on "Tockilator: Deducing Tock execution flow from Ibex Verilator traces".

We will update this post once the video becomes available! For a good synopsis of the conference check out this post.

Stay tuned!

That’s all for now. We will continue to update you with news as we go about building.