Advancing Cloud and HPC Convergence with Lawrence Livermore National Laboratory Oxide Computer Company Blog

Oxide Computer Company and Lawrence Livermore National Laboratory Work Together to Advance Cloud and HPC Convergence

Oxide Computer Company and Lawrence Livermore National Laboratory (LLNL) today announced a plan to bring on-premises cloud computing capabilities to the Livermore Computing (LC) high-performance computing (HPC) center. The rack-scale Oxide Cloud Computer allows LLNL to improve the efficiency of operational workloads and will provide users in the National Nuclear Security Administration (NNSA) with new capabilities for provisioning secure, virtualized services alongside HPC workloads.

HPC centers have traditionally run batch workloads for large-scale scientific simulations and other compute-heavy applications. HPC workloads do not exist in isolation—there are a multitude of persistent, operational services that keep the HPC center running. Meanwhile, HPC users also want to deploy cloud-like persistent services—databases, Jupyter notebooks, orchestration tools, Kubernetes clusters. Clouds have developed extensive APIs, security layers, and automation to enable these capabilities, but few options exist to deploy fully virtualized, automated cloud environments on-premises. The Oxide Cloud Computer allows organizations to deliver secure cloud computing capabilities within an on-premises environment.

On-premises environments are the next frontier for cloud computing. LLNL is tackling some of the hardest and most important problems in science and technology, requiring advanced hardware, software, and cloud capabilities. We are thrilled to be working with their exceptional team to help advance those efforts, delivering an integrated system that meets their rigorous requirements for performance, efficiency, and security.
— Steve TuckCEO at Oxide Computer Company

Leveraging the new Oxide Cloud Computer, LLNL will enable staff to provision virtual machines (VMs) and services via self-service APIs, improving operations and modernizing aspects of system management. In addition, LLNL will use the Oxide rack as a proving ground for secure multi-tenancy and for smooth integration with the LLNL-developed Flux resource manager. LLNL plans to bring its users cloud-like Infrastructure-as-a-Service (IaaS) capabilities that work seamlessly with their HPC jobs, while maintaining security and isolation from other users. Beyond LLNL personnel, researchers at the Los Alamos National Laboratory and Sandia National Laboratories will also partner in many of the activities on the Oxide Cloud Computer.

We look forward to working with Oxide to integrate this machine within our HPC center. Oxide’s Cloud Computer will allow us to securely support new types of workloads for users, and it will be a proving ground for introducing cloud-like features to operational processes and user workflows. We expect Oxide’s open-source software stack and their transparent and open approach to development to help us work closely together.

— Todd GamblinDistinguished Member of Technical Staff at LLNL

Sandia is excited to explore the Oxide platform as we work to integrate on-premise cloud technologies into our HPC environment. This advancement has the potential to enable new classes of interactive and on-demand modeling and simulation capabilities.

— Kevin PedrettiDistinguished Member of Technical Staff at Sandia National Laboratories

LLNL plans to work with Oxide on additional capabilities, including the deployment of additional Cloud Computers in its environment. Of particular interest are scale-out capabilities and disaster recovery. The latest installation underscores Oxide Computer’s momentum in the federal technology ecosystem, providing reliable, state-of-the-art Cloud Computers to support critical IT infrastructure.

To learn more about Oxide Computer, visit https://oxide.computer.

About Oxide Computer

Oxide Computer Company is the creator of the world’s first commercial Cloud Computer, a true rack-scale system with fully unified hardware and software, purpose-built to deliver hyperscale cloud computing to on-premises data centers. With Oxide, organizations can fully realize the economic and operational benefits of cloud ownership, with access to the same self-service development experience of public cloud, without the public cloud cost. Oxide empowers developers to build, run, and operate any application with enhanced security, latency, and control, and frees organizations to elevate IT operations to accelerate strategic initiatives. To learn more about Oxide’s Cloud Computer, visit oxide.computer.

About LLNL

Founded in 1952, Lawrence Livermore National Laboratory provides solutions to our nation’s most important national security challenges through innovative science, engineering, and technology. Lawrence Livermore National Laboratory is managed by Lawrence Livermore National Security, LLC for the U.S. Department of Energy’s National Nuclear Security Administration.

Media Contact

LaunchSquad for Oxide Computer oxide@launchsquad.com

Remembering Charles Beeler Oxide Computer Company Blog

We are heartbroken to relay that Charles Beeler, a friend and early investor in Oxide, passed away in September after a battle with cancer. We lost Charles far too soon; he had a tremendous influence on the careers of us both.

Our relationship with Charles dates back nearly two decades, to his involvement with the ACM Queue board where he met Bryan. It was unprecedented to have a venture capitalist serve in this capacity with ACM, and Charles brought an entirely different perspective on the practitioner content. A computer science pioneer who also served on the board took Bryan aside at one point: "Charles is one of the good ones, you know."

When Bryan joined Joyent a few years later, Charles also got to know Steve well. Seeing the promise in both node.js and cloud computing, Charles became an investor in the company. When companies hit challenging times, some investors will hide — but Charles was the kind of investor to figure out how to fix what was broken. When Joyent needed a change in executive leadership, it was Charles who not only had the tough conversations, but led the search for the leader the company needed, ultimately positioning the company for success.

Aside from his investment in Joyent, Charles was an outspoken proponent of node.js, becoming an organizer of the Node Summit conference. In 2017, he asked Bryan to deliver the conference’s keynote, but by then, the relationship between Joyent and node.js had become…​ complicated, and Bryan felt that it probably wouldn’t be a good idea. Any rational person would have dropped it, but Charles persisted, with characteristic zeal: if the Joyent relationship with node.js had become strained, so much more the reason to speak candidly about it! Charles prevailed, and the resulting talk, Platform as Reflection of Values, became one of Bryan’s most personally meaningful talks.

Charles’s persistence was emblematic: he worked behind the scenes to encourage people to do their best work, always with an enthusiasm for the innovators and the creators. As we were contemplating Oxide, we told Charles what we wanted to do long before we had a company. Charles laughed with delight: "I hoped that you two would do something big, and I am just so happy for you that you’re doing something so ambitious!"

As we raised seed capital, we knew that we were likely a poor fit for Charles and his fund. But we also knew that we deeply appreciated his wisdom and enthusiasm; we couldn’t resist pitching him on Oxide. Charles approached the investment in Oxide as he did with so many other aspects: with curiosity, diligence, empathy, and candor. He was direct with us that despite his enthusiasm for us personally, Oxide would be a challenging investment for his firm. But he also worked with us to address specific objections, and ultimately he won over his partnership. We were thrilled when he not only invested, but pulled together a syndicate of like-minded technologists and entrepreneurs to join him.

Ever since, he has been a huge Oxide fan. Befitting his enthusiasm, one of his final posts expressed his enthusiasm and pride in what the Oxide team has built.

Charles, thank you. You told us you were proud of us — and it meant the world. We are gutted to no longer have you with us; your influence lives on not just in Oxide, but also in the many people that you have inspired. You were the best of venture capital. Closer to the heart, you were a terrific friend to us both; thank you.

Debugging an OpenJDK crash on SPARC The Trouble with Tribbles...

I had to spend a little time recently fixing a crash in OpenJDK on Solaris SPARC.

What we're seeing is, from the hs_err file:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xffffffff57c745a8, pid=18442, tid=37
...
# Problematic frame:
# V  [libjvm.so+0x7745a8]  G1CollectedHeap::allocate_new_tlab(unsigned long, unsigned long, unsigned long*)+0xb8

Well that's odd. I only see this on SPARC, and I've seen it sporadically on Tribblix during the process of continually building OpenJDK on SPARC, but haven't seen it on Solaris. Until a customer hit it in production, which is rather a painful place to find a reproducer.

In terms of source, this is located in the file src/hotspot/share/gc/g1/g1CollectedHeap.cpp (all future source references will be relative to that directory), and looks like:

HeapWord* G1CollectedHeap::allocate_new_tlab(size_t min_size,
                                             size_t requested_size,
                                             size_t* actual_size) {
  assert_heap_not_locked_and_not_at_safepoint();
  assert(!is_humongous(requested_size), "we do not allow humongous TLABs");

  return attempt_allocation(min_size, requested_size, actual_size);
}

That's incredibly simple. There's not much that can go wrong there, is there?

The complexity here is that a whole load of functions get inlined. So what does it call? You find yourself in a twisty maze of passages, all alike. But anyway, the next one down is

inline HeapWord* G1CollectedHeap::attempt_allocation(size_t min_word_size,
                                                     size_t desired_word_size,
                                                     size_t* actual_word_size) {
  assert_heap_not_locked_and_not_at_safepoint();
  assert(!is_humongous(desired_word_size), "attempt_allocation() should not "
         "be called for humongous allocation requests");

  HeapWord* result = _allocator->attempt_allocation(min_word_size, desired_word_size, actual_word_size);

  if (result == NULL) {
    *actual_word_size = desired_word_size;
    result = attempt_allocation_slow(desired_word_size);
  }

  assert_heap_not_locked();
  if (result != NULL) {
    assert(*actual_word_size != 0, "Actual size must have been set here");
    dirty_young_block(result, *actual_word_size);
  } else {
    *actual_word_size = 0;
  }

  return result;
}

That then calls an inlined G1Allocator::attempt_allocation() in g1Allocator.hpp. That calls current_node_index(), which looks safe and then there are a couple of calls to mutator_alloc_region()->attempt_retained_allocation() and mutator_alloc_region()->attempt_allocation(), which come from g1AllocRegion.inline.hpp and both ultimately call a local par_allocate(), which then calls par_allocate_impl() or par_allocate() in heapRegion.inline.hpp.

Now, mostly all these are doing is calling something else. The one really complex piece of code is in par_allocate_impl() which contains

...
  do {
    HeapWord* obj = top();
    size_t available = pointer_delta(end(), obj);
    size_t want_to_allocate = MIN2(available, desired_word_size);
    if (want_to_allocate >= min_word_size) {
      HeapWord* new_top = obj + want_to_allocate;
      HeapWord* result = Atomic::cmpxchg(&_top, obj, new_top);
      // result can be one of two:
      //  the old top value: the exchange succeeded
      //  otherwise: the new value of the top is returned.
      if (result == obj) {
        assert(is_object_aligned(obj) && is_object_aligned(new_top), "checking alignment");
        *actual_size = want_to_allocate;
        return obj;
      }
    } else {
      return NULL;
    }
  } while (true);
}

Right, let's go back to the crash. We can open up the core file in
mdb, and look at the stack with $C

ffffffff7f39d751 libjvm.so`_ZN7VMError14report_and_dieEP6ThreadjPhPvS3_+0x3c(
    101cbb1d0?, b?, fffffffcb45dea7c?, ffffffff7f39ecb0?, ffffffff7f39e9a0?, 0?)
ffffffff7f39d811 libjvm.so`JVM_handle_solaris_signal+0x1d4(b?,
    ffffffff7f39ecb0?, ffffffff7f39e9a0?, 0?, ffffffff7f39e178?, 101cbb1d0?)
ffffffff7f39dde1 libjvm.so`_ZL17javaSignalHandleriP7siginfoPv+0x20(b?,
    ffffffff7f39ecb0?, ffffffff7f39e9a0?, 0?, 0?, ffffffff7e7dd370?)
ffffffff7f39de91 libc.so.1`__sighndlr+0xc(b?, ffffffff7f39ecb0?,
    ffffffff7f39e9a0?, fffffffcb4b38afc?, 0?, ffffffff7f20c7e8?)
ffffffff7f39df41 libc.so.1`call_user_handler+0x400((int) -1?,
    (siginfo_t *) 0xffffffff7f39ecb0?, (ucontext_t *) 0xc?)
ffffffff7f39e031 libc.so.1`sigacthandler+0xa0((int) 11?,
    (siginfo_t *) 0xffffffff7f39ecb0?, (void *) 0xffffffff7f39e9a0?)
ffffffff7f39e5b1 libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm+0xb8(
    10013d030?, 100?, 520?, ffffffff7f39f000?, 0?, 0?)

What you see here is the allocate_new_tlab() at the botton, it throws a signal, the signal handler catches it, passes it ultimately to JVM_handle_solaris_signal() which bails, and the JVM exits.

We can look at the signal. It's at address 0xffffffff7f39ecb0 and is of type siginfo_t, so we can just print it

java:core> ffffffff7f39ecb0::print -t siginfo_t

and we first see

siginfo_t {
    int si_signo = 0t11 (0xb)
    int si_code = 1
    int si_errno = 0
...

OK, the signal was indeed 11 = SIGSEGV. The interesting thing is the si_code of 1, which is defined as

#define SEGV_MAPERR     1       /* address not mapped to object */

Ah. Now, in the jvm you actually see a lot of SIGSEGV, but a lot of them are handled by that mysterious JVM_handle_solaris_signal(). In particular, it'll handle anything with SEGV_ACCERR which is basically something running off the end of an array.

Further down, you can see the fault address

struct  __fault = {
            void *__addr = 0x10
            int __trapno = 0
            caddr_t __pc = 0
            int __adivers = 0
        }

So, we're faulting on address 0x10. Yes, you try messing around down there and you will fault.


That confirms the crash is a SEGV. What are we actually trying to do? We can disassemble the allocate_new_tlab() function and see what's happening - remember the crash was at offset 0xb8

java:core> libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm::dis
...
 libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm+0xb8:

       ldx       [%i4 + 0x10], %i5

That's interesting, 0x10 was the fault address. What's %i4 then?

java:core> ::regs
%i4 = 0x0000000000000000

Yep. Given that, we'll try and read 0x10, giving the SEGV we see.

There's a little more context around that call site. A slightly
expanded view is

 libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm+0xa0:        nop
 libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm+0xa4:        add       %
i5, %g1, %g1
 libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm+0xa8:        casx      [
%g3], %i5, %g1
 libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm+0xac:        cmp       %
i5, %g1
 libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm+0xb0:        be,pn     %
xcc, +0x160  <libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm+0x210>
 libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm+0xb4:        nop
 libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm+0xb8:        ldx       [
%i4 + 0x10], %i5

Now, the interesting thing here is the casx (compare and swap) instruction. That lines up with the Atomic::cmpxchg() in par_allocate_impl() that we were suspecting above. So the crash is somewhere around there.

It turns out there's another way to approach this. If we compile without optimization then effectively we turn off the inlining. The way to do this is to add an entry to the jvm Makefile via make/hotspot/lib/JvmOverrideFiles.gmk

...
else ifeq ($(call isTargetOs, solaris), true)
    ifeq ($(call isTargetCpuArch, sparc), true)
      # ptribble port tweaks
      BUILD_LIBJVM_g1CollectedHeap.cpp_CXXFLAGS += -O0
    endif
endif

If we rebuild (having touched all the files in the directory to force
make to rebuild everything correctly), and run again, we get the full
call stack:

Now the crash is

# V  [libjvm.so+0x80cc48]  HeapRegion::top() const+0xc

which we can expand to the following stack leading up to where it goes
into the signal handler.:

ffffffff7f39dff1 libjvm.so`_ZNK10HeapRegion3topEv+0xc(0?, ffffffff7f39ef40?,
    101583e38?, ffffffff7f39f020?, fffffffa46de8038?, 10000?)
ffffffff7f39e0a1 libjvm.so`_ZN10HeapRegion17par_allocate_implEmmPm+0x18(0?,
    100?, 10000?, ffffffff7f39ef60?, ffffffff7f39ef40?, 8f00?)
ffffffff7f39e181                     
libjvm.so`_ZN10HeapRegion27par_allocate_no_bot_updatesEmmPm+0x24(0?, 100?,
    10000?, ffffffff7f39ef60?, 566c?, 200031?)
ffffffff7f39e231                     
libjvm.so`_ZN13G1AllocRegion12par_allocateEP10HeapRegionmmPm+0x44(100145440?,
    0?, 100?, 10000?, ffffffff7f39ef60?, 0?)
ffffffff7f39e2e1 libjvm.so`_ZN13G1AllocRegion18attempt_allocationEmmPm+0x48(
    100145440?, 100?, 10000?, ffffffff7f39ef60?, 3?, fffffffa46ceff48?)
ffffffff7f39e3a1 libjvm.so`_ZN11G1Allocator18attempt_allocationEmmPm+0xa4(
    1001453b0?, 100?, 10000?, ffffffff7f39ef60?, 7c0007410?, ffffffff7f39ea41?)
ffffffff7f39e461 libjvm.so`_ZN15G1CollectedHeap18attempt_allocationEmmPm+0x2c(
    10013d030?, 100?, 10000?, ffffffff7f39ef60?, 7c01b15e8?, 0?)
ffffffff7f39e521 libjvm.so`_ZN15G1CollectedHeap17allocate_new_tlabEmmPm+0x24(
    10013d030?, 100?, 10000?, ffffffff7f39ef60?, 0?, 0?)

So yes, this confirms that we are indeed in par_allocate_impl() and
it's crashing on the very first line of the code segment I showed
above, where it calls top(). All top() does is return the _top member
of a HeapRegion.

So the only thing that can happen here is that the HeapRegion itself
is NULL. Then the _top member is presumably at offset 0x10, and trying
to access it gives the SIGSEGV.

Now, in G1AllocRegion::attempt_allocation() there's an assert:

  HeapRegion* alloc_region = _alloc_region;
  assert_alloc_region(alloc_region != NULL, "not initialized properly");

However, asserts aren't compiled into production builds.

But the fix here is to fail if we've got NULL and let the caller
retry. There are a lot of calls here, and the general approach is to
return NULL if anything goes wrong, so I do the same for this extra
failure case, adding the following:

  if (alloc_region == NULL) {
    return NULL;
  }

With that, no more of those pesky crashes. (There might be others
lurking elsewhere, of course.)

Of course, what this doesn't explain is why the HeapRegion wasn't
correctly initialized in the first place. But that's another problem
entirely.

How Oxide Cuts Data Center Power Consumption in Half Oxide Computer Company Blog

Here’s a sobering thought: today, data centers already consume 1-2% of the world’s power, and that percentage will likely rise to 3-4% by the end of the decade. According to Goldman Sachs research, that rise will include a doubling in data center carbon dioxide emissions. As the data and AI boom progresses, this thirst for power shows no signs of slowing down anytime soon. Two key challenges quickly become evident for the 85% of IT that currently lives on-premises.

  1. How can organizations reduce power consumption and corresponding carbon emissions?

  2. How can organizations keep pace with AI innovation as existing data centers run out of available power?

Graph of AI & Data Center Growth Boosting Electricity Demand
Figure 1. Masanet et al. (2020), Cisco, IEA, Goldman Sachs Research

Rack-scale design is critical to improved data center efficiency

Traditional data center IT consumes so much power because the fundamental unit of compute is an individual server; like a house where rooms were built one at a time, with each room having its own central AC unit, gas furnace, and electrical panel. Individual rackmount servers are stacked together, each with their own AC power supplies, cooling fans, and power management. They are then paired with storage appliances and network switches that communicate at arm’s length, not designed as a cohesive whole. This approach fundamentally limits organizations' ability to maintain sustainable, high-efficiency computing systems.

Of course, hyperscale public cloud providers did not design their data center systems this way. Instead, they operate like a carefully planned smart home where everything is designed to work together cohesively and is operated by software that understands the home’s systems end-to-end. High-efficiency, rack-scale computers are deployed at scale and operate as a single unit with integrated storage and networking to support elastic cloud computing services. This modern architecture is made available to the market as public cloud, but that rental-only model is ill-fit for many business needs.

Illustration of Oxide racks at a higher density (2x) than conventional ones

Compared to a popular rackmount server vendor, Oxide is able to fill our specialized racks with 32 AMD Milan sleds and highly-available network switches using less than 15kW per rack, doubling the compute density in a typical data center. With just 16 of the alternative 1U servers and equivalent network switches, over 16kW of power is required per rack, leading to only 1,024 CPU cores vs Oxide’s 2,048.

Extracting more useful compute from each kW of power and square foot of data center space is key to the future effectiveness of on-premises computing.

At Oxide, we’ve taken this lesson in advancing rack-scale design, improved upon it in several ways, and made it available for every organization to purchase and operate anywhere in the world without a tether back to the public cloud. Our Cloud Computer treats the entire rack as a single, unified computer rather than a collection of independent parts, achieving unprecedented power efficiency.

By designing the hardware and software together, we’ve eliminated unnecessary components and optimized every aspect of system operation through a control plane with visibility to end-to-end operations.

When we started Oxide, the DC bus bar stood as one of the most glaring differences between the rack-scale machines at the hyperscalers and the rack-and-stack servers that the rest of the market was stuck with. That a relatively simple piece of copper was unavailable to commercial buyers — despite being unequivocally the right way to build it! — represented everything wrong with the legacy approach.

The bus bar in the Oxide Cloud Computer is not merely more efficient, it is a concrete embodiment of the tremendous gains from designing at rack-scale, and by integrating hardware with software.

— Bryan Cantrill

The improvements we’re seeing are rooted in technical innovation

  • Replacing low-efficiency AC power supplies with a high-efficiency DC Bus Bar
    Power conversion is performed once AC power is fed from the data center to the Oxide universal power shelf with a customized power shelf controller (PSC). The power shelf distributes DC power up and down the rack via a bus bar. This eliminates the 70 total AC power supplies found in an equivalent legacy server rack within 32 servers, two top-of-rack switches, and one out-of-band switch, each with two AC power supplies. This power shelf also ensures the load is balanced across phases, something that’s impossible with traditional power distribution units found in legacy server racks.

  • Bigger fans = bigger efficiency gains
    Oxide server sleds are designed to a custom form factor to accommodate larger fans than legacy servers typically use. These fans can move more air more efficiently, cooling the systems using 12x less energy than legacy servers, which each contain as many as 7 fans, which must work much harder to move air over system components.

  • Purpose-built for power efficiency
    Oxide server sleds have less restrictive airflow than legacy servers by eliminating extraneous components like PCIe risers, storage backplanes, and more. Legacy servers need many optional components like these because they could be used for any number of tasks, such as point-of-sale systems, data center servers, or network-attached-storage (NAS) systems. Still, they were never designed optimally for any one of those tasks. The Oxide Cloud Computer was designed from the ground up to be a rack-scale cloud computing powerhouse, and so it’s optimized for exactly that task.

  • Hardware + Software designed together
    The Oxide Cloud Computer includes a robust cloud control plane with deep observability to the full system. By designing the hardware and software together, we can make hardware choices like more intelligent DC-DC power converters that can provide rich telemetry to our control plane, enabling future feature enhancements such as dynamic power capping and efficiency-based workload placement that are impossible with legacy servers and software systems.

Learn more about Oxide’s intelligent Power Shelf Controller

The Bottom Line: Customers and the Environment Both Benefit

Reducing data center power demands and achieving more useful computing per kilowatt requires fundamentally rethinking traditional data center utilization and compute design. At Oxide, we’ve proven that dramatic efficiency gains are possible when you rethink the computer at rack-scale with hardware and software designed thoughtfully and rigorously together.

Ready to learn how your organization can achieve these results? Schedule time with our team here.

Together, we can reclaim on-premises computing efficiency to achieve both business and sustainability goals.

OmniOS Community Edition r151052 OmniOS Community Edition

OmniOSce v11 r151052 is out!

On the 4th of November 2024, the OmniOSce Association has released a new stable version of OmniOS - The Open Source Enterprise Server OS. The release comes with many tool updates, brand-new features and additional hardware support. For details see the release notes.

Note that r151048 is now end-of-life. You should upgrade to r151050 or r151052 to stay on a supported track.

OmniOS is fully Open Source and free. Nevertheless, it takes a lot of time and money to keep maintaining a full-blown operating system distribution. Our statistics show that there are almost 2’000 active installations of OmniOS while fewer than 20 people send regular contributions. If your organisation uses OmniOS based servers, please consider becoming a regular patron or taking out a support contract.


Any problems or questions, please get in touch.

Reflections on Founder Mode The Observation Deck

Paul Graham’s Founder Mode is an important piece, and you should read it if for no other reason that “founder mode” will surely enter the lexicon (and as Graham grimly predicts: “as soon as the concept of founder mode becomes established, people will start misusing it”). When building a company, founders are engaged in several different acts at once: raising capital; building a product; connecting that product to a market; building an organization to do all of these. Founders make lots of mistakes in all of these activities, and Graham’s essay highlights a particular kind of mistake in which founders are overly deferential to expertise or convention. Pejoratively referring to this as “Management Mode”, Graham frames this in the Silicon Valley dramaturgical dyad of Steve Jobs and John Scully. While that’s a little too reductive (anyone seeking to understand Jobs needs to read Randall Stross’s superlative Steve Jobs and the NeXT Big Thing, highlighting Jobs’s many post-Scully failures at NeXT), Graham has identified a real issue here, albeit without much specificity.

For a treatment of the same themes but with much more supporting detail, one should read the (decade-old) piece from Tim O’Reilly, How I failed. (Speaking personally, O’Reilly’s piece had a profound influence on me, as it encouraged me to stand my ground on an issue on which I had my own beliefs but was being told to defer to convention.) But as terrific as it is, O’Reilly’s piece also doesn’t answer the question that Graham poses: how do founders prevent their companies from losing their way?

Graham says that founder mode is a complete mystery (“There are as far as I know no books specifically about founder mode”), and while there is a danger in being too pat or prescriptive, there does seem to be a clear component for keeping companies true to themselves: the written word. That is, a writing- (and reading-!) intensive company culture does, in fact, allow for scaling the kind of responsibility that Graham thinks of as founder mode. At Oxide, our writing-intensive culture has been absolutely essential: our RFD process is the backbone of Oxide, and has given us the structure to formalize, share, and refine our thinking. First among this formalized thinking – and captured in our first real RFD – is RFD 2 Mission, Principles, and Values. Immediately behind that (and frankly, the most important process for any company) is RFD 3 Oxide Hiring Process. These first three RFDs – on the process itself, on what we value, and on how we hire – were written in the earliest days of the company, and they have proven essential to scale the company: they are the foundation upon which we attract people who share our values.

While the shared values have proven necessary, they haven’t been sufficient to eliminate the kind of quandaries that Graham and O’Reilly describe. For example, there have been some who have told us that we can’t possibly hire non-engineering roles using our hiring process – or told us that our approach to compensation can’t possibly work. To the degree that we have had a need for Graham’s founder mode, it has been in those moments: to stay true to the course we have set for the company. But because we have written down so much, there is less occasion for this than one might think. And when it does occur – when there is a need for further elucidation or clarification – the artifact is not infrequently a new RFD that formalizes our newly extended thinking. (RFD 68 is an early public and concrete example of this; RFD 508 is a much more recent one that garnered some attention.)

Most importantly, because we have used our values as a clear lens for hiring, we are able to assure that everyone at Oxide is able to have the same disposition with respect to responsibility – and this (coupled with the transparency that the written word allows) permits us to trust one another. As I elucidated in Things I Learned The Hard Way, the most important quality in a leader is to bind a team with mutual trust: with it, all things are possible – and without it, even easy things can be debilitatingly difficult. Graham mentions trust, but he doesn’t give it its due. Too often, founders focus on the immediacy of a current challenge without realizing that they are, in fact, undermining trust with their approach. Bluntly, founders are at grave risk of misinterpreting Graham’s “Founders Mode” to be a license to micromanage their teams, descending into the kind of manic seagull management that inhibits a team rather than empowering it.

Founders seeking to internalize Graham’s advice should recast it by asking themselves how they can foster mutual trust – and how they can build the systems that allow trust to be strengthened even as the team expands. For us at Oxide, writing is the foundation upon which we build that trust. Others may land on different mechanisms, but the goal of founders should be the same: build the trust that allows a team to kick a Jobsian dent in the universe!

Reflections on Founder Mode Oxide Computer Company Blog

Paul Graham’s Founder Mode is an important piece, and you should read it if for no other reason that "founder mode" will surely enter the lexicon (and as Graham grimly predicts: "as soon as the concept of founder mode becomes established, people will start misusing it"). When building a company, founders are engaged in several different acts at once: raising capital; building a product; connecting that product to a market; building an organization to do all of these. Founders make lots of mistakes in all of these activities, and Graham’s essay highlights a particular kind of mistake in which founders are overly deferential to expertise or convention. Pejoratively referring to this as "Management Mode", Graham frames this in the Silicon Valley dramaturgical dyad of Steve Jobs and John Scully. While that’s a little too reductive (anyone seeking to understand Jobs needs to read Randall Stross’s superlative Steve Jobs and the NeXT Big Thing, highlighting Jobs’s many post-Scully failures at NeXT), Graham has identified a real issue here, albeit without much specificity.

For a treatment of the same themes but with much more supporting detail, one should read the (decade-old) piece from Tim O’Reilly, How I failed. (Speaking personally, O’Reilly’s piece had a profound influence on me, as it encouraged me to stand my ground on an issue on which I had my own beliefs but was being told to defer to convention.) But as terrific as it is, O’Reilly’s piece also doesn’t answer the question that Graham poses: how do founders prevent their companies from losing their way?

Graham says that founder mode is a complete mystery ("There are as far as I know no books specifically about founder mode"), and while there is a danger in being too pat or prescriptive, there does seem to be a clear component for keeping companies true to themselves: the written word. That is, a writing- (and reading-!) intensive company culture does, in fact, allow for scaling the kind of responsibility that Graham thinks of as founder mode. At Oxide, our writing-intensive culture has been absolutely essential: our RFD process is the backbone of Oxide, and has given us the structure to formalize, share, and refine our thinking. First among this formalized thinking – and captured in our first real RFD – is RFD 2 Mission, Principles, and Values. Immediately behind that (and frankly, the most important process for any company) is RFD 3 Oxide Hiring Process. These first three RFDs – on the process itself, on what we value, and on how we hire – were written in the earliest days of the company, and they have proven essential to scale the company: they are the foundation upon which we attract people who share our values.

While the shared values have proven necessary, they haven’t been sufficient to eliminate the kind of quandaries that Graham and O’Reilly describe. For example, there have been some who have told us that we can’t possibly hire non-engineering roles using our hiring process – or told us that our approach to compensation can’t possibly work. To the degree that we have had a need for Graham’s founder mode, it has been in those moments: to stay true to the course we have set for the company. But because we have written down so much, there is less occasion for this than one might think. And when it does occur – when there is a need for further elucidation or clarification – the artifact is not infrequently a new RFD that formalizes our newly extended thinking. (RFD 68 is an early public and concrete example of this; RFD 508 is a much more recent one that garnered some attention.)

Most importantly, because we have used our values as a clear lens for hiring, we are able to assure that everyone at Oxide is able to have the same disposition with respect to responsibility – and this (coupled with the transparency that the written word allows) permits us to trust one another. As I elucidated in Things I Learned The Hard Way, the most important quality in a leader is to bind a team with mutual trust: with it, all things are possible – and without it, even easy things can be debilitatingly difficult. Graham mentions trust, but he doesn’t give it its due. Too often, founders focus on the immediacy of a current challenge without realizing that they are, in fact, undermining trust with their approach. Bluntly, founders are at grave risk of misinterpreting Graham’s "Founders Mode" to be a license to micromanage their teams, descending into the kind of manic seagull management that inhibits a team rather than empowering it.

Founders seeking to internalize Graham’s advice should recast it by asking themselves how they can foster mutual trust – and how they can build the systems that allow trust to be strengthened even as the team expands. For us at Oxide, writing is the foundation upon which we build that trust. Others may land on different mechanisms, but the goal of founders should be the same: build the trust that allows a team to kick a Jobsian dent in the universe!

KORH Minimum Sector Altitude Gotcha Josef "Jeff" Sipek

I had this draft around for over 5 years—since January 2019. Since I still think it is about an interesting observation, I’m publishing it now.

In late December (2018), I was preparing for my next instrument rating lesson which was going to involve a couple of ILS approaches at Worcester, MA (KORH). While looking over the ILS approach to runway 29, I noticed something about the minimum sector altitude that surprised me.

Normally, I consider MSAs to be centered near the airport for the approach. For conventional (i.e., non-RNAV) approaches, this tends to be the main navaid used during the approach. At Worcester, the 25 nautical mile MSA is centered on the Gardner VOR which is 19 nm away.

I plotted the MSA boundary on the approach chart to visualize it better:

It is easy to glance at the chart, see 3300 most of the way around, but not realize that when flying in the vicinity of the airport we are near the edge of the MSA. GRIPE, the missed approach hold fix, is half a mile outside of the MSA. (Following the missed approach procedure will result in plenty of safety, of course, so this isn’t really that relevant.)

What's a decent password length? The Trouble with Tribbles...

What's a decent length for a password?

I think it's pretty much agreed by now that longer passwords are, in general, better. And fortunately stupid complexity requirements are on the way out.

Reading the NIST password rules gives the following:

  • User chosen passwords must be at least 8 characters
  • Machine chosen passwords must be at least 6 characters
  • You must allow passwords to be at least 64 characters

Say what? A 6 character password is secure?

Initially, that seems way off, but it depends on your threat model. If you have a mechanism to block the really bad commonly used passwords, then 6 characters gives you a billion choices. Not many, but you should also be implementing technical measures such as rate limiting.

With that, if the only attack vector is brute force over the network, trying a billion passwords is simply impractical. Even with just passive rate limiting (limited by cpu power and network latency) an attacker will struggle; with active limiting they'll be trying for decades.

That's with just 6 random characters. Go to 8 and you're out of sight. And for this attack vector, no quantum computing developments will make any difference whatsoever.

But what if the user database itself is compromised?

Of course, if the passwords are in cleartext then no amount of fancy rules or length requirements is going to help you at all.

But if an attacker gets encrypted passwords then they can simply brute force them many orders of magnitude faster. Or use rainbow tables. And that's a whole different threat model.

Realistically, protecting against brute force or rainbow table attacks probably needs a 16 character password (or passphrase), and that requirement could get longer over time.

A corollary to this is that there isn't actually much to be gained to requiring password lengths between 8 and 16 characters.

In illumos, the default minimum password length is 6 characters. I recently increased the default in Tribblix to 8, which aligns with the user chosen limit that NIST give.

OmniOS Community Edition r151050 OmniOS Community Edition

OmniOSce v11 r151050 is out!

On the 6th of May 2024, the OmniOSce Association has released a new stable version of OmniOS - The Open Source Enterprise Server OS. The release comes with many tool updates, brand-new features and additional hardware support. For details see the release notes.

Note that r151038 is now end-of-life. You should upgrade to r151046 or r151050 to stay on a supported track. r151046 is an LTS release with support until May 2026, and r151050 is a stable release with support until May 2025.

For anyone who tracks LTS releases, the previous LTS - r151038 - is now end-of-life. You should upgrade to r151046 for continued LTS support.

OmniOS is fully Open Source and free. Nevertheless, it takes a lot of time and money to keep maintaining a full-blown operating system distribution. Our statistics show that there are almost 2’000 active installations of OmniOS while fewer than 20 people send regular contributions. If your organisation uses OmniOS based servers, please consider becoming a regular patron or taking out a support contract.


Any problems or questions, please get in touch.

Unsynchronized PPS Experiment Josef "Jeff" Sipek

Late last summer I decided to do a simple experiment—feed my server a PPS signal that wasn’t synchronized to any timescale. The idea was to give chrony a reference that is more stable than the crystal oscillator on the motherboard.

Hardware

For this PPS experiment I decided to avoid all control loop/feedback complexity and just manually set the frequency to something close enough and let it drift—hence the unsynchronized. As a result, the circuit was quite simple:

The OCXO was a $5 used part from eBay. It outputs a 10 MHz square wave and has a control voltage pin that lets you tweak the frequency a little bit. By playing with it, I determined that a 10mV control voltage change yielded about 0.1 Hz frequency change. The trimmer sets this reference voltage. To “calibrate” it, I connected it to a frequency counter and tweaked the trimmer until a frequency counter read exactly 10 MHz.

10 MHz is obviously way too fast for a PPS signal. The simplest way to turn it into a PPS signal is to use an 8-bit microcontroller. The ATmega48P’s design seems to have very deterministic timing (in other words it adds a negligible amount of jitter), so I used it at 10 MHz (fed directly from the OCXO) with a very simple assembly program to toggle an output pin on and off. The program kept an output pin high for exactly 2 million cycles, and low for 8 million cycles thereby creating a 20% duty cycle square wave at 1 Hz…perfect to use as a PPS. Since the jitter added by the microcontroller is measured in picoseconds it didn’t affect the overall performance in any meaningful way.

The ATmega48P likes to run at 5V and therefore its PPS output is +5V/0V, which isn’t compatible with a PC serial port. I happened to have an ADM3202 on hand so I used it to convert the 5V signal to an RS-232 compatible signal. I didn’t do as thorough of a check of its jitter characteristics, but I didn’t notice anything bad while testing the circuit before “deploying” it.

Finally, I connected the RS-232 compatible signal to the DCD pin (but CTS would have worked too).

The whole circuit was constructed on a breadboard with the OCXO floating in the air on its wires. Power was supplied with an iPhone 5V USB power supply. Overall, it was a very quick and dirty construction to see how well it would work.

Software

My server runs FreeBSD with chrony as the NTP daemon. The configuration is really simple.

First, setting dev.uart.0.pps_mode to 2 informs the kernel that the PPS signal is on DCD (see uart(4)).

Second, we need to tell chrony that there is a local PPS on the port:


refclock PPS /dev/cuau0 local 

The local token is important. It tells chrony that the PPS is not synchronized to UTC. In other words, that the PPS can be used as a 1 Hz frequency source but not as a phase source.

Performance

I ran my server with this PPS refclock for about 50 days with chrony configured to log the time offset of each pulse and to apply filtering to every 16 pulses. (This removes some of the errors related to serial port interrupt handling not being instantaneous.) The following evaluation uses only these filtered samples as well as the logged data about the calculated system time error.

In addition to the PPS, chrony used several NTP servers from the internet (including the surprisingly good time.cloudflare.com) for the date and time-of-day information. This is a somewhat unfortunate situation when it comes to trying to figure out how good of an oscillator the OCXO is, as to make good conclusions about one oscillator one needs a better quality oscillator for the comparison. However, there are still a few things one can look at even when the (likely) best oscillator is the one being tested.

NTP Time Offset

The ultimate goal of a PPS source is to stabilize the system’s clock. Did the PPS source help? I think it is easy to answer that question by looking at the remaining time offset (column 11 in chrony’s tracking.log) over time.

This is a plot of 125 days that include the 50 days when I had the PPS circuit running. You can probably guess which 50 days. (The x-axis is time expressed as Wikipedia article:  Modified Julian Date, or MJD for short.)

I don’t really have anything to say aside from—wow, what a difference!

For completeness, here’s a plot of the estimated local offset at the epoch (column 7 in tracking.log). My understanding of the difference between the two columns is fuzzy but regardless of which I go by, the improvement was significant.

Fitting a Polynomial Model

In addition to looking at the whole-system performance, I wanted to look at the PPS performance itself.

As before, the x-axis is MJD. The y-axis is the PPS offset as measured and logged by chrony—the 16-second filtered values.

The offset started at -486.5168ms. This is an arbitrary offset that simply shows that I started the PPS circuit about half a second off of UTC. Over the approximately 50 days, the offset grew to -584.7671ms.

This means that the OCXO frequency wasn’t exactly 10 MHz (and therefore the 1 PPS wasn’t actually at 1 Hz). Since there is a visible curve to the line, it isn’t a simple fixed frequency error but rather the frequency drifted during the experiment.

How much? I used Wikipedia article:  R’s lm function to fit simple polynomials to the collected data. I tried a few different polynomial degrees, but all of them were fitted the same way:


m <- lm(pps_offset ~ poly(time, poly_degree, raw=TRUE))
a <- as.numeric(m$coefficients[1])
b <- as.numeric(m$coefficients[2])
c <- as.numeric(m$coefficients[3])
d <- as.numeric(m$coefficients[4])

In all cases, these coefficients correspond to the 4 terms in a + bt + ct 2 + dt 3 . For lower-degree polynomials, the missing coefficients are 0.

Note: Even though the plots show the x-axis in MJD, the calculations were done in seconds with the first data point at t=0 seconds.

Linear

The simplest model is a linear one. In other words, fitting a straight line through the data set. lm provided the following coefficients:

a=-0.480090626569894
b=-2.25787872135774e-08

That is an offset of -480.09ms and slope of -22.58ns/s (which is also -22.58 ppb frequency error).

Graphically, this is what the line looks like when overlayed on the measured data:

Not bad but also not great. Here is the difference between the two:

Put another way, this is the PPS offset from UTC if we correct for time offset (a) and a frequency error (b). The linear model clearly doesn’t handle the structure in the data completely. The residual is near low-single-digit milliseconds. We can do better, so let’s try to add another term.

Quadratic

lm produced these coefficients for a degree 2 polynomial:

a=-0.484064700277606
b=-1.75349684277379e-08
c=-1.10412099841665e-15

Visually, this fits the data much better. It’s a little wrong on the ends, but overall quite nice. Even the residual (below) is smaller—almost completely confined to less than 1 millisecond.

a is still time offset, b is still frequency error, and c is a time “acceleration” of sorts.

There is still very visible structure to the residual, so let’s add yet another term.

Cubic

As before, lm yielded the coefficients. This time they were:

a=-0.485357232306569
b=-1.44068934233748e-08
c=-2.78676248986831e-15
d=2.45563844387287e-22

That’s really close looking!

The residual still has a little bit of a wave to it, but almost all the data points are within 500 microseconds. I think that’s sufficiently close given just how much non-deterministic “stuff” (both hardware and software) there is between a serial port and an OS kernel’s interrupt handler on a modern server. (In theory, we could add additional terms forever until we completely eliminated the residual.)

So, we have a model of what happened to the PPS offset over time. Specifically, a + bt + ct 2 + dt 3 and the 4 constants. The offset (a of approximately -485ms) is easily explained—I started the PPS at the “wrong” time. The frequency error (b of approximately -14.4 ppb) can be explained as I didn’t tune the oscillator to exactly 10 MHz. (More accurately, I tuned it, unplugged it, moved it to my server, and plugged it back in. The slightly different environment could produce a few ppb error.)

What about the c and d terms? They account for a combination of a lot of things. Temperature is a big one. First of all, it is a home server and so it is subject to air-conditioner cycling on and off at a fairly long interval. This produces sizable swings in temperature, which in turn mess with the frequency. A server in a data center sees much less temperature variation, since the chillers keep the temperature essentially constant (at least compared to homes). Second, the oscillator was behind the server and I expect the temperature to slightly vary based on load.

One could no doubt do more analysis (and maybe at some point I will), but this post is already getting way too long.

Conclusion

One can go nuts trying to play with time and time synchronization. This is my first attempt at timekeeping-related circuitry, so I’m sure there are ways to improve the circuit or the analysis.

I think this experiment was a success. The system clock behavior improved beyond what’s needed for a general purpose server. Getting under 20 ppb error from a simple circuit on a breadboard with absolutely no control loop is great. I am, of course, already tinkering with various ideas that should improve the performance.

Tribblix image structural changes The Trouble with Tribbles...

The Tribblix live ISO and related images are put together every so slightly differently in the latest m34 release.

All along, there's been an overlay (think a group package) called base-iso that lists the packages that are present in the live image. On installation, this is augmented with a few extra packages that you would expect to be present in a running system but which don't make much sense in a live image, to construct the base system.

You can add additional software, but the base is assumed to be present.

The snag with this is that base-iso is very much a single-purpose generic concept. By its very nature it has to be minimal enough to not be overly bloated, yet contain as many drivers as necessary to handle the majority of systems.

As such, the regular ISO image has fallen between 2 stools - it doesn't have every single driver, so some systems won't work, while it has a lot of unnecessary drivers for a lot of common use cases.

So what I've done is split base-iso into 2 layers. There's a new core-tribblix overlay, which is the common packages, and then base-iso adds all the extra drivers. By and large, the regular live image for m34 isn't really any different to what was present before.

But the concepts of "what packages do I need for applications to work" and "what packages do I want to load on a given downloadable ISO" have now been split.

What this allows is to easily create other images with different rules. As of m34, for example, the "minimal" image is actually created from a new base-server overlay, which again sits atop core-tribblix and differs from base-iso in that it has all the FC drivers. If you're installing on a fibre-channel connected system then using the minimal image will work better (and if you're SAN-booted, it will work where the regular ISO won't).

The next use case is that images for cloud or virtual systems simply don't need most of the drivers. This cuts out a lot of packages (although it doesn't actually save that much space).

The standard Tribblix base system now depends on core-tribblix, not base-iso or any of the specific image layers. This is as it should be - userland and applications really shouldn't care what drivers are present.

One side-effect of this change is that it makes minimising zones easier, because what gets installed in a zone can be based on that stripped-down core-tribblix overlay.

Engineering a culture Oxide Computer Company Blog

We ran into an interesting issue recently. On the one hand, it was routine: we had a bug — a regression — and the team quickly jumped on it, getting it root caused and fixed. But on the other, this particular issue was something of an Oxide object lesson, representative not just of the technologies but also of the culture we have built here. I wasn’t the only person who thought so, and two of my colleagues wrote terrific blog entries with their perspectives:

The initial work as described by Matt represents a creative solution to a thorny problem; if it’s clear in hindsight, it certainly wasn’t at the time! (In Matt’s evocative words: "One morning, I had a revelation.") I first learned of Matt’s work when he demonstrated it during our weekly Demo Friday, an hour-long unstructured session to demo our work for one another. Demo Friday is such an essential part of Oxide’s culture that it feels like we have always done it, but in fact it took us nearly two years into the company’s life to get there: over the spring and summer of 2021, our colleague Sean Klein had instituted regular demos for the area that he works on (the Oxide control plane), and others around the company — seeing the energy that came from it — asked if they, too, could start regular demos for their domain. But instead of doing it group by group, we instituted it company-wide starting in the fall of 2021: an unstructured hour once a week in which anyone can demo anything.

In the years since, we have had demos of all scopes and sizes. Importantly, no demo is too small — and we have often found that a demo that feels small to someone in the thick of work will feel extraordinary to someone outside of it. ("I have a small demo building on the work of a lot of other people" has been heard so frequently that it has become something of an inside joke.) Demo Friday is important because it gets to one of our most important drivers as technologists: the esteem of our peers. The thrill that you get from showing work to your colleagues is unparalleled — and their wonderment in return is uniquely inspiring. (Speaking personally, Matt’s demo addressed a problem that I had personally had many times over in working on Hubris — and I was one of the many w00ts in the chat, excited to see his creative solution!)

Having the demos be company-wide has also been a huge win for not just our shared empathy and teamwork but also our curiosity and versatility: it’s really inspiring to have (say) one colleague show how they used PCB backdrilling for signal integrity, and the next show an integration they built using Dropshot between our CRM and spinning up a demonstration environment for a customer. And this is more than just idle intellectual curiosity: our stack is deep — spanning both hardware and software — and the demos make for a fun and engaging way to learn about aspects of the system that we don’t normally work on.

Returning to Matt and Cliff, if Matt’s work implicitly hits on aspects of our culture, Cliff’s story of debugging addresses that culture explicitly, noting that the experience demonstrated:

Tight nonhierarchical integration of the team. This isn’t a Hubris feature, but it’s hard to separate Hubris from the team that built it. Oxide’s engineering team has essentially no internal silos. Our culture rewards openness, curiosity, and communication, and discourages defensiveness, empire-building, and gatekeeping. We’ve worked hard to create and defend this culture, and I think it shows in the way we organized horizontally, across the borders of what other organizations would call teams, to solve this mystery.

In the discussion on Hacker News of Cliff’s piece, this cultural observeration stood out, with a commenter asking:

I’d love to hear more about the motivations for crafting such a culture as well as some particular implementation details. I’m curious if there are drawbacks to fostering "openness, curiosity, and communication" within an organization?

The culture at Oxide is in fact very deliberate: when starting a company, one is building many things at once (the team, the product, the organization, the brand) — and the culture will both inform and be reinforced by all of these. Setting that first cultural cornerstone was very important to us — starting with our mission, principles, and values. Critically, by using our mission, principles, and values as the foundation for our hiring process, we have deliberately created a culture that reinforces itself.

Some of the implementation details:

  • We have uniform compensation (even if it might not scale indefinitely)

  • We are writing intensive (but we still believe in spoken collaboration)

  • We have no formalized performance review process (but we believe in feedback)

  • We record every meeting (but not every conversation)

  • We have a remote work force (but we also have an office)

  • We are non-hierarchical (but we all ultimately report to our CEO)

  • We don’t use engineering metrics (but we all measure ourselves by our customers and their success)

If it needs to be said, there is plenty of ambiguity: if you are using absolutes to think of Oxide (outside of our principles of honesty, integrity and decency!) you are probably missing some nuance of our culture.

Finally, to the (seemingly loaded?) question of the "drawbacks" of fostering "openness, curiosity, and communication" within an organization, the only drawback is that it’s hard work: culture has to be deliberate without being overly prescriptive, and that can be a tricky balance. In this regard, building a culture is very different than building (say) software: it is not engineered in a traditional sense, but is rather a gooey, squishy, organism that will evolve over time. But the reward of the effort is something that its participants care intensely about: it will continue to be (in Cliff’s words) a culture that we work hard to not just create but defend!

OmniOS is not affected by CVE-2024-3094 OmniOS Community Edition

Yesterday we learned of a supply chain back door in the xz-utils software via an announcement at https://www.openwall.com/lists/oss-security/2024/03/29/4. The vulnerability was distributed with versions 5.6.0 and 5.6.1 of xz; and has been assigned CVE-2024-3094.

OmniOS is NOT affected by CVE-2024-3094

The malicious code is only present in binary artefacts if the build system is Linux (and there are some additional constraints too) and if the system linker is GNU ld – neither of which are true for our packages. The payload is also a Linux ELF binary which would not successfully link into code built for OmniOS, and requires features which are only present in the GNU libc.

We have also only ever shipped xz-utils 5.6.x as part of the unstable bloody testing release, stable releases contain older versions:

  • r151038 ships version 5.2.6
  • r151046 ships version 5.4.2
  • r151048 ships version 5.4.4
  • bloody ships version 5.6.1

Despite being unaffected, we have now switched builds of xz in bloody to using the raw source archive, which does not contain the malicious injection code, and generating the autoconf files ourselves. We have not downgraded to an earlier version as it is not clear which earlier version can be considered completely safe given that the perpetrator has been responsible for maintaining and signing releases back to version 5.4.3. Once a cleaned 5.6.2 release is available, we will upgrade to that.


Any problems or questions, please get in touch.

Disabling Monospaced Font Ligatures Josef "Jeff" Sipek

A recent upgrade of FreeBSD on my desktop resulted in just about every program (Firefox, KiCAD, but thankfully not urxvt) rendering various ligatures even for monospaced fonts. Needless to say, this is really annoying when looking at code, etc. Not having any better ideas, I asked on Mastodon if anyone knew how to turn this mis-feature off.

About an hour later, @monwarez@bsd.cafe suggested dropping the following XML in /usr/local/etc/fonts/conf.avail/29-local-noto-mono-fixup.conf and adding a symlink in ../conf.d to enable it:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "urn:fontconfig:fonts.dtd">
<fontconfig>
        <description>Disable ligatures for monospaced fonts to avoid ff, fi, ffi, etc. becoming only one character wide</description>
        <match target="font">
                <test name="family" compare="eq">
                        <string>Noto Sans Mono</string>
                </test>
                <edit name="fontfeatures" mode="append">
                        <string>liga off</string>
                        <string>dlig off</string>
                </edit>
        </match>
</fontconfig>

This solved my problem. Hopefully this will help others. if not, it’s a note-to-self for when I need to reapply this fixup :)

Moore's Scofflaws Oxide Computer Company Blog

Years ago, Jeff Bezos famously quipped that "your margin is my opportunity." This was of course aimed not at Amazon’s customers, but rather its competitors, and it was deadly serious: customers of AWS in those bygone years will fondly remember that every re:Invent brought with it another round of price cuts. This era did not merely reflect Bezos’s relentless execution, but also a disposition towards who should reap the reward of advances in underlying technology: Amazon believed (if implicitly) that improvements at the foundations of computing (e.g., in transistor density, core count, DRAM density, storage density, etc.) should reflect themselves in lower prices for consumers rather than higher margins for suppliers.

Price cuts are no longer a re:Invent staple, having been replaced by a regular Amazon tradition of a different flavor: cutting depreciation (and therefore increasing earnings) by extending the effective life of their servers. (These announcements are understandably much more subdued, as "my depreciation is my margin opportunity" doesn’t have quite the same ring to it.)

As compute needs have grown and price cuts have become an increasingly distant memory, some have questioned their sky-high cloud bills, wondering if they should in fact be owning their compute instead of renting it. When we started Oxide, we knew from operating our own public cloud what those economics looked like — and we knew that over time others of a particular scale would come to the same realization that they would be better off not giving their margin away by renting compute. (Though it’s safe to say that we did not predict that it would be DHH leading the charge!)

Owning one’s own cloud sounds great, but there is a bit that’s unsaid: what about the software? Software is essential for elastic, automated infrastructure: hardware alone does not a cloud make! Unfortunately, the traditional server vendors do not help here: because of a PC-era divide in how systems are delivered, customers are told to look elsewhere for any and all system software. This divide is problematic on several levels. First, it impedes the hardware/software co-design that we (and, famously, others!) believe is essential to deliver the best possible product. Second, it leads to infamous finger pointing when the whole thing doesn’t work. But there is also a thorny economic problem: when your hardware and your software don’t come from the same provider, to whom should go the spoils of better hardware?

To someone who has just decided to buy their hardware out of their frustration with renting it, the answer feels obvious: whoever owns the hardware should naturally benefit from its advances! Unfortunately, the enterprise software vendor delivering your infrastructure often has other ideas — and because their software is neither rented nor bought, but rather comes from the hinterlands of software licensing, they have broad latitude as to how it is priced and used. In particular, this allows them to charge based on the hardware that you run it on — to have per-core software licensing.

This galling practice isn’t new (and is in fact as old as symmetric multiprocessing systems), but it has taken on new dimensions in the era of chiplets and packaging innovation: the advances that your next CPU has over your current one are very likely to be expressed in core count. Per-core licensing allows a third party — who neither made the significant investment in developing the next generation of microprocessor nor paid for the part themselves — to exact a tax on improved infrastructure. (And this tax can be shockingly brazen!) Couple this with the elimination of perpetual licensing, and software costs can potentially absorb the entire gain from a next-generation CPU, leaving a disincentive to run newer, more efficient infrastructure. As an industry, we have come to accept this practice, but we shouldn’t: in the go-go era of Dennard scaling (when clock rates rose at a blistering rate), software vendors never would have been allowed to get away with charging by the gigahertz; we should not allow them to feel so emboldened to charge by core count now!

If it needs to be said, we have taken a different approach at Oxide: when you buy the Oxide cloud computer, all of the software to run it is included. This includes all of the software necessary to run the rack as elastic infrastructure: virtual compute, virtual storage, virtual networking. (And yes, it’s all open source — which unfortunately demands the immediate clarification that it’s actually open source rather than pretend open source.) When we add a new feature to our software, there is no licensing enablement or other such nuisance — the feature just comes with the next update. And what happens when AMD releases a new CPU with twice the core count? The new sled running the new CPU runs along your existing rack — you’re not paying more than the cost of the new sled itself. This gives the dividends of Moore’s Law (or Wright’s Law!) to whom they rightfully belong: the users of compute.

The SunOS JDK builder The Trouble with Tribbles...

I've been building OpenJDK on Solaris and illumos for a while.

This has been moderately successful; illumos distributions now have access to up to date LTS releases, most of which work well. (At least 11 and 17 are fine; 21 isn't quite right.)

There are even some third-party collections of my patches, primarily for Solaris (as opposed to illumos) builds.

I've added another tool. The SunOS jdk builder.

The aim here is to be able to build every single jdk tag, rather than going to one of the existing repos which only have the current builds. And, yes, you could grope through the git history to get to older builds, but one problem with that is that you can't actually fix problems with past builds.

Most of the content is in the jdk-sunos-patches repository. Here there are patches for both illumos and Solaris (they're ever so slightly different) for every tag I've built.

(That's almost every jdk tag since the Solaris/SPARC/Studio removal, and a few before that. Every so often I find I missed one. And there's been the odd bad patch along the way.)

The idea here is to make it easy to build every tag, and to do so on a current system. I've had to add new patches to get some of the older builds to work. The world has changed, we have newer compilers and other tools, and the OS we're building on has evolved. So if someone wanted to start building the jdk from scratch (and remember that you have to build all the versions in sequence) then this would be useful.

I'm using it for a couple of other things.

One is to put back SPARC support on illumos and Solaris. The initial port I did was on x86 only, so I'm walking through older builds and getting them to work on SPARC. We'll almost certainly not get to jdk21, but 17 seems a reasonable target.

The other thing is to enable the test suites, and then run them, and hopefully get them clean. At the moment they aren't, but a lot of that is because many tests are OS-specific and they don't know what Solaris is so get confused. With all the tags, I can bisect on failures and (hopefully) fix them.