Using the 'zadm' utility to create an lx zone OmniOS Community Edition

This week’s ascii-cast shows how to use zadm to quickly create an lx-branded zone in OmniOS r151034 and how to configure a basic policy in the zone firewall. The zone firewall works for all zone brands except for KVM and is ideal for securing an lx zone where the native iptables tools don’t work.

nginx is used as an example application here, but in reality one would deploy something like nginx in a native branded zone, most likely with the sparse brand.

This is just a basic zone setup. Configuring more features such as memory and CPU caps will be covered in a future article.


zadm is open source and hosted on Github. Feedback and pull requests are welcome.

Any questions, please get in touch!

procmail and threads Staring at the C

I’m apparently old-school enough to find gmail and co painfully inefficient for handling significant amounts of mail. I still find procmail+mutt hard to beat. One thing mutt can’t do, however, is filter threads automatically - there’s no “mute” facility like gmail has; threads have to processed manually.

Equally, procmail itself has no threading facilities or understanding of Message-Id or References.

Matching email threads

It can be done, though, with some cheesy awk:

#!/bin/bash

#
# If a mail message has a References: value found in the refs file, then
# add the requested header.
#
# Usage:
#
# cat mail_msgs | match-thread.sh ~/.mail.refs.muted "Muted: true"
#

ref_file="$1"
header="$2"

mail=/tmp/match-thread.mail.$$
cat - >$mail

newrefs="$(cat $mail | formail -x references -x message-id | tr -d '\n')"

touch $ref_file

cat $ref_file | awk -v newrefs="$newrefs" '

   BEGIN {
           found = 0;
           split(newrefs, tmp);
           for (i in tmp) {
                   refs[tmp[i]]++;
           }
   }

   # Each thread will have one line in the ref file, with
   # space-separated references. So we just need to look for any
   # reference from the mail.
   {
           for (ref in refs) {
                   if (index($0, ref) != 0) {
                           found = 1;
                           exit(0);
                   }
           }
   }

   END {
           exit(found ? 0 : 1);
   }
'

if [[ $? = 0 ]]; then
        cat $mail | formail -i "$header"
else
        cat $mail
fi

rm $mail

Essentially, we record all the References in the thread we’re trying to act on. Then we can trigger the above to see if the new mail is part of the thread of interest.

(This seems like the sort of thing formail could do, given its -D option has a message ID cache, but I haven’t even bothered to take a look at how hard that would be…)

procmail usage

In .procmailrc, we’d use this like so:

:0 Wfh: formail.lock
| $HOME/src/procmail-thread/match-thread.sh $HOME/.refs.muted "Procmail-Muted: true"

:0 Wfh: formail.lock
| $HOME/src/procmail-thread/match-thread.sh $HOME/.refs.watched "Procmail-Watched: true"

This will add the given header if we find any of the email’s References values in our “database”.

Then, we can do what we like with the mails, like deliver them as already-read, carbon copy them to the inbox, etc.:

:0
* Procmail-Muted: true
{
        SWITCHRC=$HOME/.procmailrc.markread
}

:0
* Procmail-Watched: true
{
        :0 c:
        $DEFAULT

        SWITCHRC=$HOME/.procmailrc.markread
}

:0
$DEST/

mutt usage

To actually watch or mute a thread, we add a couple of mutt macros:

macro index,pager "M" "|~/src/procmail-thread/add-thread.sh ~/.refs.muted<return>"
macro index,pager "W" "|~/src/procmail-thread/add-thread.sh ~/.refs.watched<return>"

The add-thread.sh script is similar to the above, but populates the refs file with all message IDs found in the given email.

I put all this in a git repo.

Arrow Keys in Firefox Staring at the C

I’m not the only one disappointed in the way the web has worked out in lots of ways. From <blink> onwards, so much semantic information is missing from the average website, sometimes wilfully it appears. Why is there so little structural data on what the components of a page are?

One particular peccadillo I dislike is “Previous/Next Page” elements on a list page. Nobody ever uses <a rel="next" ...>. If you’re lucky, there’s an aria-label attribute for accessibility purposes, but as it’s a free-form text, and there isn’t even a convention, it could be pretty much anything.

For reasons unclear to me, almost no sites make use of the left/right arrow keys for navigation. So if I want to map those keys to prev/next, instead of a nice little bit of configuration, I have to resort to this user script:

      (function() {
    'use strict';

    /* NB: we already tested for prefix/suffix, so this RE is OK. */
    function wholeWordMatch(haystack, needle) {
        let r = new RegExp("\\s" + needle + "\\s");
        return r.test(haystack);
    };

    const LEFT_KEY_CODE = 37;
    const RIGHT_KEY_CODE = 39;

    const prevStrings = [
        "previous page",
        "previous",
        "prev"
    ];

    const nextStrings = [
        "next page",
        "next"
    ];

    document.addEventListener("keyup", function(e) {

        if (!e) {
            e = window.event;
        }

        if (e.isComposing) {
            return;
        }

        switch (e.target.tagName) {
            case "TEXTAREA":
            case "INPUT":
                return;
        }

        const key = e.keyCode ? e.keyCode : e.which;

        var matches = undefined;

        if (key == LEFT_KEY_CODE) {
            matches = prevStrings;
        } else if (key == RIGHT_KEY_CODE) {
            matches = nextStrings;
        } else {
            return;
        }

        let found = undefined;
        let score = 0;

        document.querySelectorAll("a").forEach((link) => {
            let strs = [ link.textContent ];

            if (!link.href) {
                return;
            }

            /* This is often a good match if the text itself isn't. */
            if (link.attributes["aria-label"]) {
                strs.push(link.attributes["aria-label"].nodeValue);
            }

            for (let str of strs) {
                if (typeof str === "undefined") {
                    return;
                }

                str = str.toLowerCase();

                /*
                 * There's no perfect way to find the "best" link, but in
                 * practice this works on a reasonable number of sites: an exact
                 * match, or exact prefix or suffix, always wins; otherwise, we
                 * match a whole-word sub-string: "Go to prev <<" will match,
                 * but not "dpreview.com".
                 */
                for (let match of matches) {
                    if (str === match) {
                        found = link;
                        break;
                    }

                    if (str.startsWith(match) || str.endsWith(match)) {
                        found = link;
                        break;
                    }

                    if (score < 1 && wholeWordMatch(str, match)) {
                        found = link;
                        score = 1;
                    }
                }
            }
        });

        if (found) {
            found.click();
        }

  }, true);
})();

Yet again, hacky, but it mostly works. It’s pretty cool that this is even possible though.

Now you can boot SmartOS off of a ZFS pool Kebe Says: Dan McD's blog

Booting from a zpool

The most recent published biweekly release of SmartOS has a new feature I authored: the ability to manage and boot SmartOS-bootable ZFS pools.

A few people read about this feature, and jumped to the conclusion that the SmartOS boot philosophy, enumerated here:

  • The "/" filesystem is on a ramdisk
  • The "/usr" filesystem is read-only
  • All of the useful state is stored on the zones ZFS pool.

were suddenly thrown out the window. Nope.

This change is the first phase in a plan to not depend on ISO images or USB sticks for SmartOS, or Triton, to boot.

The primary thrust of this specific SmartOS change was to allow installation-time enabling of a bootable zones pool. The SmartOS installer now allows one to specify a bootable pool, either one created during the "create my special pools" shell escape, or just by specifying zones.

A secondary thrust of this change was to allow running SmartOS deployments to upgrade their zones pools to be BIOS bootable (if the pool structure allows booting), OR to create a new pool with new devices (and use zpool create -B) to be dedicated to boot. For example:

smartos# zpool create -B c3t0d0 standalone
smartos# piadm bootable -e standalone
smartos# piadm bootable
standalone                     ==> BIOS and UEFI
zones                          ==> non-bootable
smartos# 

Under the covers

Most of what’s above can be gleaned from the manual page. This section will discuss what the layout of a bootable pool actually looks like, and how the piadm(1M) command sets things up, and expects things to BE set up.

Bootable pool basics

The piadm bootable command will indicate if a pool is bootable at all via the setting of the bootfs property on the pool. That gets you the BIOS bootability check, which admittedly is an assumption. The UEFI check happens by finding the disks s0 slice, and seeing if it’s formatted as pcfs, and if the proper EFI System Partition boot file is present.

bootfs layout

For standalone SmartOS booting, bootfs is supposed to be mounted on "/" with the pathname equal to the bootfs name. By convention, we prefer POOL/boot. Let’s take a look:


smartos# piadm bootable zones ==> BIOS and UEFI smartos# piadm list PI STAMP BOOTABLE FILESYSTEM BOOT BITS? NOW NEXT 20200810T185749Z zones/boot none yes yes 20200813T030805Z zones/boot next no no smartos# cd /zones/boot smartos# ls -lt total 9 lrwxrwxrwx 1 root root 27 Aug 25 15:58 platform -> ./platform-20200810T185749Z lrwxrwxrwx 1 root root 23 Aug 25 15:58 boot -> ./boot-20200813T030805Z drwxr-xr-x 3 root root 3 Aug 14 16:10 etc drwxr-xr-x 4 root root 15 Aug 13 06:07 boot-20200813T030805Z drwxr-xr-x 4 root root 5 Aug 13 06:07 platform-20200813T030805Z drwxr-xr-x 4 1345 staff 5 Aug 10 20:30 platform-20200810T185749Z smartos#

Notice that the Platform Image stamp 20200810T185749Z is currently booted, and will be booted the next time. Notice, however, that there are no “BOOT BITS”, also known as the Boot Image, for 20200810T185749Z, and instead the 20200813T030805Z boot bits are employed? This allows a SmartOS bootable pool to update just the Platform Image (ala Triton) without altering loader. If one utters piadm activate 20200813T030805Z, then things will change:

smartos# piadm activate 20200813T030805Z
smartos# piadm list
PI STAMP           BOOTABLE FILESYSTEM            BOOT BITS?   NOW   NEXT  
20200810T185749Z   zones/boot                     none         yes   no   
20200813T030805Z   zones/boot                     next         no    yes  
smartos# ls -lt
total 9
lrwxrwxrwx   1 root     root          27 Sep  2 00:25 platform -> ./platform-20200813T030805Z
lrwxrwxrwx   1 root     root          23 Sep  2 00:25 boot -> ./boot-20200813T030805Z
drwxr-xr-x   3 root     root           3 Aug 14 16:10 etc
drwxr-xr-x   4 root     root          15 Aug 13 06:07 boot-20200813T030805Z
drwxr-xr-x   4 root     root           5 Aug 13 06:07 platform-20200813T030805Z
drwxr-xr-x   4 1345     staff          5 Aug 10 20:30 platform-20200810T185749Z
smartos# 

piadm(1M) manipulates symbolic links in the boot filesystem to set versions of both the Boot Image (i.e. loader) and the Platform Image.

Tailscale for illumos Nahum Shalman

NOTE: This page is probably still a work in progress. I'm slowly working on upstreaming the necessary changes in small bits of spare time as I find them. If others have time/interest to help, I'd certainly appreciate it!

Background reading:

A while back, Josh Clulow did an initial port of wireguard-go to illumos. I was able to make some small modifications to it sufficient to get it to point the Tailscale client code at it and build tailscale binaries for illumos. Here's how to build them yourself:

You'll need git and a Go toolchain. I tested in a SmartOS zone using go114-1.14.4 and that seems to work. In the future we'll need Go 1.15.

<install git and a go toolchain>
git clone https://github.com/nshalman/wireguard-go -b tailscale-illumos wireguard-go
git clone https://github.com/nshalman/tailscale -b illumos tailscale
cd tailscale
sed -e "s|/home/admin/wireguard-go|$(cd ../wireguard-go ; pwd)|" -i go.mod
GOOS=illumos go build tailscale.com/cmd/tailscale
GOOS=illumos go build tailscale.com/cmd/tailscaled

As of the time of writing, my initial pull request to get the wireguard-go bits upstreamed is waiting on a combination of cleanups, and hopefully some additional functionality being added to x/sys/unix. Once that's done, a cleanup of my fork of tailscale to be ready for upstreaming will hopefully follow (the feature request where it's being tracked is here).

Since you shouldn't trust random people on the internet, you should probably review the code and build it yourself, however, if you know and trust me, I've uploaded binaries and my SMF manifest which you can download and experiment with (the version strings in the binaries are patched so that you know you got them from me):

To use them, drop the binaries into /usr/local/sbin, mkdir /etc/tailscale import them manifest (svccfg import tailscale.xml), start the daemon svcadm enable tailscale, then run tailscale up (I'm assuming you've already set up your Tailscale account, etc.)

Help is welcome and I intend to update this page as the work makes progress. Last updated August 31,2020. Feel free to troll me on Twitter if this note is still here and it's 2021. :)

On a related note, if you only need Wireguard and not Tailscale, you should also check out Mike Zeller's illumos port of boringtun which is a rust implementation. I think I managed to get it working, but stopped experimenting with it when Tailscale came out...

Home Data Center 3.0 -- Part 2: HDC's many uses Kebe Says: Dan McD's blog

In the prior post, I mentioned a need for four active ethernet ports. These four ports are physical links to four distinct Ethernet networks. Joyent's SmartOS and Triton characterize these with NIC Tags. I just view them as distinct networks. They are all driven by the illumos igb(7d) driver (hmm, that man page needs updating) on HDC 3.0, and I'll specify them now:

  • igb0 - My home network.
  • igb1 - The external network. This port is directly attached to my FiOS Optical Network Terminal's Gigabit Ethernet port.
  • igb2 - My work network. Used for my workstation, and "external" NIC Tag for my work-at-home Triton deployment, Kebecloud.
  • igb3 - Mostly unused for now, but connected to Kebecloud's "admin" NIC Tag.
The zones abstraction in illumos allows not just containment, but a full TCP/IP stack to be assigned to each zone. This makes a zone feel more like a proper virtual machine in most cases. Many illumos distros are able to run a full VMM as the only process in a zone, which ends up delivering a proper virtual machine. As of this post's publication, however, I'm only running illumos zones, not full VM ones. Here's their list:
(0)# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              ipkg     shared
   1 webserver        running    /zones/webserver               lipkg    excl  
   2 work             running    /zones/work                    lipkg    excl  
   3 router           running    /zones/router                  lipkg    excl  
   4 calendar         running    /zones/calendar                lipkg    excl  
   5 dns              running    /zones/dns                     lipkg    excl  
(0)# 
Their zone names correspond to their jobs:
  • global - The illumos global zone is what exists even in the absence of other zones. Some illumos distros, like SmartOS, encourage minimizing what a global zone has for services. HDC's global zone serves NFS and SMB/CIFS to my home network. The global zone has the primary link into the home network. HDC's global zone has no default route, so if any operations that need out-of-the-house networking either go through another zone (e.g. DNS lookups), or a defaut route must be temporarily added (e.g. NTP chimes, `pkg update`).
  • webserver - Just like the name says, this zone hosts the web server for kebe.com. For this zone, it uses lofs(7FS), the loopback virtual file system to inherit subdirectories from the global zone. I edit blog entries (like this one) for this zone via NFS from my laptop. The global zone serves NFS, but the files I'm editing are not only available in the global zone, but are also lofs-mounted into the webserver zone as well. The webserver zone has a vnic (see here for details about a vnic, the virtual network interface controller) link to the home network, but has a default route, and the router zone's NAT (more later) forwards ports 80 and 443 to this zone. Additionally, the home network DHCP server lives here, for no other reason than, "it's not the global zone."
  • work - The work zone is new in the past six years, and as of recently, eschews lofs(7FS) for delegated ZFS datasets. A delegated ZFS dataset, a proper filesystem in this case, is assigned entirely to the zone. This zone also has the primary (and only) link to the work network, a physical connection (for now unused) to my work Triton's admin network, and an etherstub vnic (see here for details about an etherstub) link to the router zone. The work zone itself is a router for work network machines (as well as serves DNS for the work network), but since I only have one public IP address, I use the etherstub to link it to the router zone. The zone, as of recent illumos builds, can further serve its own NFS. This allows even less global-zone participation with work data, and it means work machines do not need backchannel paths to the global zone for NFS service. The work zone has a full illumos development environment on it, and performs builds of illumos rather quickly. It also has its own Unbound (see the DNS zone below) for the work network.
  • router - The router zone does what the name says. It has a vnic link to the home network and the physical link to the external network. It runs ipnat to NAT etherstub work traffic or home network traffic to the Internet, and redirects well-known ports to their respective zones. It does not use a proper firewall, but has IPsec policy in place to drop anything that isn't matched by ipnat, because in a no-policy situation, ipnat lets unmatched packets arrive on the local zone. The router zone also runs the (alas still closed source) IKEv1 daemon to allow me remote access to this server while I'm remote. It uses an old test tool from the pre-Oracle Sun days a few of you half-dozen readers will know by name. We have a larval IKEv2 out in the community, and I'll gladly switch to that once it's available.
  • calendar - Blogged about when first deployed, this zone's sole purpose is to serve our calendar both internally and externally. It uses the Radicale server. Many of my complaints from the prior post have been alleviated by subsequent updates. I wish the authors understood interface stability a bit better (jumping from 2.x to 3.0 was far more annoying than it needed to be), but it gets the job done. It has a vnic link to the home network, a default route, and gets calendaring packets shuffled to it by the router zone so my family can access the calendar wherever we are.
  • dns - A recent switch to OmniOSce-supported NSD and Unbound encouraged me to bring up a dedicated zone for DNS. I run both daemons here, and have the router zone redirect public kebe.com requests here to NSD. The Unbound server services all networks that can reach HDC. It has a vnic link to the home network, and a default route.

The first picture shows HDC as a single entity, and its physical networks. The second picture shows the zones of HDC as Virtual Network Machines, which should give some insight into why I call my home server a Home Data Center.

HDC, physically HDC, logically

FreeBSD Sound: ALSA & Qt Josef "Jeff" Sipek

Sound in FreeBSD is somewhat complicated because of the various portability and compatibility shims. Last week, I hit an annoying to diagnose situation: I plugged in a USB sound card and while the kernel and some applications detected it just fine, other applications didn’t seem to notice it at all.

At first, I thought it was a Qt issue since only Qt applications appeared broken. But then, mere minutes before emailing a FreeBSD mailing list, I managed to find a hint that it was likely an ALSA on FreeBSD issue. Some searching later, I learned that in order for ALSA to see the device, it needed a mapping to the actual OSS device.

So, after adding the following to ~/.asoundrc, any ALSA application (and therefore any Qt application) that tries to list the sound devices will see a “ft991a” device:

pcm.ft991a {
        type oss
        device /dev/dsp3
}

To make it more explicit, without adding the above stanza to .asoundrc:

  1. OSS applications work fine.
  2. PortAudio applications work fine.
  3. ALSA applications did not see the device.

With the stanza, everything seems to work.

Home Data Center 3.0 -- Part 1: Back to AMD Kebe Says: Dan McD's blog

Twelve years ago I built my first Home Data Center (HDC). Six years ago I had OmniTI's Supermicro rep put together the second one.

Unlike last time, I'm not going to recap the entirety of HDC 2.0. I will mention briefly that since its 2014 inception, I've only upgraded its mirrored spinning-rust disk drives twice: once from 2TB to 6TB, and last year from 6TB to 14TB. I'll detail the current drives in the parts list.

Like last time, and the time before it, I started with a CPU in mind. AMD has been on a tear with Ryzen and EPYC. I still wanted low-ish power, but since I use some of HDC's resources for work or the illumos community, I figured a core-count bump would be worth the cost of some watts. Lucky me, the AMD Ryzen 7 3700x fit the bill nicely: Double the cores & threads with a 20W TDP increase.

Unlike last time, but like the time before it, I built this one from parts myself. It took a little digging, and I made one small mistake in parts selection, but otherwise it all came together nicely.

Parts

  • AMD Ryzen 7 3700x - It supports up to 128GB of ECC RAM, it's double the CPU of the old HDC for only 50% more TDP wattage. It's another good upgrade.
  • Noctua NH-U12S (AM4 edition) CPU cooler - I was afraid the stock cooler would cover the RAM slots on the motherboard. Research suggested the NH-U12S would prevent this problem, and the research panned out. Also Noctua's support email, in spite of COVID, has been quite responsive.
  • ASRock Rack X470D4U - While only having two Gigabit Ethernet (GigE) ports, this motherboard was the only purpose-built Socket AM4 server motherboard. It has IPMI/BMC on its own Ethernet port (but you'll have to double check it doesn't "failover" to your first GigE port). It has four DIMM slots, and with the current BIOS (mine shipped with it), supports 128GB of RAM. There are variants with Two 10 Gigabit Ethernet (10GigE) ports, but I opted for the less expensive GigE one. If I'd wanted to wait, there's a new, not yet available, X570 version, whose more expensive version has both two 10GigE AND two GigE ports, which would saved me from needing...
  • Intel I350 dual-port Gigabit Ethernet card - This old reliable is well supported and tested. It brings me up to the four ethernet ports I need.
  • Nemix RAM - 4x32GB PC3200 ECC Unbuffered DIMMS - Yep, like HDC 2.0, I maxxed out my RAM immediately. 6 years ago I'd said 32GB would be enough, and for the most part that's still true, except I sometimes wish to perform multiple concurrent builds, or memory-map large kernel dumps for debugging. The vendor is new-to-me, and did not have a lot of reviews on Newegg. I ran 2.5 passes of memtest86 against the memory, and it held up under those tests. Nightly builds aren't introducing bitflips, which I saw on HDC 1.0 when it ran mixed ECC/non-ECC RAM.
  • A pair of 500GB Samsung 860 EVO SATA SSDs - These are slightly used, but they are mirrored, and partitioned as follows:
    • s0 -- 256MB, EFI System partiontion (ESP)
    • s1 -- 100GB, rpool for OmniOSce
    • s2 -- 64GB, ZFS intent log device (slog)
    • s3 -- 64GB, unused, possible future L2ARC
    • s4 -- 2GB, unused
    • The remaining 200-something GB is unassigned, and fodder for the wear-levellers. The motherboard HAS a pair of M.2 connectors for NVMe or SATA SSDs in that form-factor, but these were hand-me-downs, so free.
  • A pair of Western Digital Ultrastar (nee HGST Ultrastar) HC530 14TB Hard Drives - These are beasts, and according to Backblaze stats released less than a week ago, its 12TB siblings hold up very well with respect to failure rates.
  • Fractal Design Meshify C case - I'd mentioned a small mistake, and this case was it. NOT because the case is bad... the case is quite good, but because I bought the case thinking I needed to optimize for the microATX form factor, and I really didn't need to. The price I paid for this was the inability to ever expand to four 3.5" drives if I so desire. In 12 years of HDC, though, I've never exceeded that. That's why this is only a small mistake. The airflow on this case is amazing, and there's room for more fans if I ever need them.
  • Seasonic Focus GX-550 power supply - In HDC 1.0, I had to burn through two power supplies. This one has a 10 year warranty, so I don't think I'll have to stress about it.
  • OmniOSce stable releases - Starting with HDC 2.0, I've been running OmniOS, and its community-driven successor, OmniOSce. The every-six-month stable releases strike a good balance between refreshes and stability.

I've given two talks on how I use HDC. Since the last of those was six years ago, I'm going to stop now, and dedicate the next post to how I use HDC 3.0.

Now self-hosted at kebe.com Kebe Says: Dan McD's blog

Let's throw out the first pitch.

I've moved my blog over from blogspot to here at kebe.com. I've recently upgraded the hardware for my Home Data Center (the subject of a future post), and while running the Blahg software doesn't require ANY sort of hardware upgrade, I figured since I had the patient open I'd make the change now.

Yes it's been almost five years since last I blogged. Let's see, since the release of OmniOS r151016, I've:

  • Cut r151018, r151020, and r151022.
  • Got RIFfed from OmniTI.
  • Watched OmniOS become OmniOSce with great success.
  • Got hired at Joyent and made more contributions to illumos via SmartOS.
  • Tons more I either wouldn't blog about, or just plain forgot to mention.
So I'm here now, and maybe I'll pick up again? The most prolific year I had blogging was 2007 with 11 posts, with 2011 being 2nd place with 10. Not even sure if I *HAVE* a half-dozen readers anymore, but now I have far more control over the platform (and the truly wonderful software I'm using).

While Blahg supports comments, I've disabled them for now. I might re-enabled them down the road, but for now, you can find me on one of the two socials on the right and comment there.

Goodbye blogspot Kebe Says: Dan McD's blog

First off, long time no blog!

This is the last post I'm putting on the Blogspot site. In the spirit of eating my own dogfood, I've now set up a self-hosted blog on my HDC. I'm sure it won't be hard for all half-dozen of you readers to move over. I'll have new content over there, at the very least the Hello, World post, a catchup post, and a HDC 3.0 post to match the ones for 1.0 and 2.0.

Coming Soon (updated) Kebe Says: Dan McD's blog

This is STILL only a test, but with an update. The big question remains: How quickly I can bring over my old Google owned blog entries?

(BTW Jeff, I could get used to this LaTeX-like syntax…)

You’ll start to see stuff trickle in. I’m using the hopefully-pushed-back-soon enhancements to Blahg that allows simple raw HTML for entries. I’ve done some crazy thing to extract, and maybe the hello-world post here will explain that.

From 0-to-illumos on OmniOS r151016 Kebe Says: Dan McD's blog

Today we updated OmniOS to its next stable release: r151016. You can click the link to see its release notes, and you may notice a brief mention the illumos-tools package.

I want to see more people working on illumos. A way to help that is to get people started on actually BUILDING illumos more quickly. To that end, r151016 contains everything to bring up an illumos development environment. You can develop small on it, but this post is going to discuss how we make building all of illumos-gate from scratch easier. (I plan on updating the older post on small/focused compilation after ws(1) and bldenv(1) effectively merge into one tool.)

The first thing you want to do is install OmniOS. The latest release media can be found here, on the Installation page.

After installation, your system is a blank slate. You'll need to set a root password, create a non-root user, and finally add networking parameters. The OmniOS wiki's General Administration Guide covers how to do this.

I've added a new building illumos page to the OmniOS wiki that should detail how straightforward the process is. You should be able to kick off a full nightly(1ONBLD) build quickly enough. If you don't want to edit one of the omnios-illumos-* samples in /opt/onbld/env, just make sure you have a $USER/ws directory, clone one of illumos-gate or illumos-omnios into $USER/ws/testws and use one of the template /opt/onbld/env/omnios-illumos-* files corresponding to illumos-gate or illumos-omnios. For example:


omnios(~)[0]% mkdir ws
omnios(~)[0]% cd ws
omnios(~/ws)[0]% git clone https://github.com/illumos/illumos-gate/ testws

omnios(~/ws)[0]% /bin/time /opt/onbld/bin/nightly /opt/onbld/env/omnios-illumos-gate
You can then look in testws/log/log-date&time/mail_msg to see how your build went.

Quick Reminder -- tcp_{xmit,recv}_hiwat and high-bandwidth*delay networks Kebe Says: Dan McD's blog

I was recently working with a colleague on connecting two data centers via an IPsec tunnel. He was using iperf (coming soon to OmniOS bloody along with netperf) to test the bandwidth, and was disappointed in his results.

The amount of memory you need to hold a TCP connection's unacknowledged data is the Bandwidth-Delay product. The defaults shipped in illumos are small on the receive side:


bloody(~)[0]% ndd -get /dev/tcp tcp_recv_hiwat
128000
bloody(~)[0]%
and even smaller on the transmit side:

bloody(~)[0]% ndd -get /dev/tcp tcp_xmit_hiwat
49152
bloody(~)[0]%

Even platforms with Automatic tuning, the maximums they use are often not set highly enough.

Introducing IPsec into the picture adds additional latency (if not so much for encryption thanks to AES-NI & friends, then for the encapsulation and checks). This often is enough to take what are normally good enough maximums and invalidate them as too small. To change these on illumos, you can use the ndd(1M) command shown above, OR you can use the modern, persists-across-reboots, ipadm(1M) command:


bloody(~)[1]% sudo ipadm set-prop -p recv_buf=1048576 tcp
bloody(~)[0]% sudo ipadm set-prop -p send_buf=1048576 tcp
bloody(~)[0]% ipadm show-prop -p send_buf tcp
PROTO PROPERTY PERM CURRENT PERSISTENT DEFAULT POSSIBLE
tcp send_buf rw 1048576 1048576 49152 4096-1048576
bloody(~)[0]% ipadm show-prop -p recv_buf tcp
PROTO PROPERTY PERM CURRENT PERSISTENT DEFAULT POSSIBLE
tcp recv_buf rw 1048576 1048576 128000 2048-1048576
bloody(~)[0]%

There's future work there in not only increasing the upper bound (easy), but also adopting the automatic tuning so the maximum just isn't taken right off the bat.

New HDC service: Calendaring (or, The Limitation Game) Kebe Says: Dan McD's blog

I'll start by stating my biases: I don't like data bloat like ASN.1, XML, or even bloaty protocols like HTTP. (Your homework: Would a 1980s-developed WAN-scale RPC have obviated HTTP? Write a paper with your answer to that question, with support.) I understand the big problems they attempt to solve. I also still think not enough people in the business were paying attention in OS (or Networking) class when seeing the various attempts at data representation during the 80s and 90s. Also, I generally like pushing intelligence out to the end-nodes, and in client/server models, this means the clients. CalDAV rubs me the wrong way on the first bias, and MOSTLY the right way on my second bias, though the clients I use aren't very smart. I will admit near-complete ignorance of CalDAV. I poked a little at its RFC, looking up how Alarms are implemented, and discovered that mostly, Alarm processing is a client issue. ("This specification makes no attempt to provide multi-user alarms on group calendars or to find out for whom an alarm is intended.")

I've configured Radicale on my Home Data Center. I need to publicly thank Lauri Tirkkonen (aka. lotheac on Freenode) for the IPS publisher which serves me up Radicale. Since my target audience is my family-of-four, I wasn't particularly concerned with its reported lack of scalability. I also didn't want to have CalDAV be a supplicant of Apache or another web server for the time. If I decide to revisit my web server choices, I may move CalDAV to that new webserver (likely nginx). I got TLS and four users configured on stock Radicale.

My job was to make an electronic equivalent of our family paper calendar. We have seven (7) colors/categories for this calendar (names withheld from the search engines): Whole-Family, Parent1, Parent2, Both-Parents, Child1, Child2, Both-Children. I thought, given iCal (10.6), Calendar.app (10.10), or Calendar (iOS), it wouldn't be too hard for these to be created and shared. I was mildly wrong.

I'm not sure if what I had to do was a limitation of my clients, of Radicale, or of CalDAV itself, but I had to create seven (7) different accounts, each with a distinct ends-in-'/' URL:

  • https://.../Whole-Family.ics/
  • https://.../Parent1.ics/
  • https://.../Parent2.ics/
  • https://.../Both-Parents.ics/
  • https://.../Child1.ics/
  • https://.../Child2.ics/
  • https://.../Both-Children.ics/
I had to configure N (large N) devices or machine-logins with these seven (7) accounts. Luckily, Radicale DID allow me to restrict Child1's and Child2's write access to just their own calendars. Apart from that, we want the whole family to read all of the calendars. This means the colors are uniform across all of our devices (stored on the server). It also means any alarms (per above) trigger on ALL of our devices. This makes alarms (something I really like in my own Calendar) useless. Modulo the alarms problem (which can be mitigated by judicious use of iOS's Reminders app and a daily glance at the calendar), this seems to end up working pretty well, so far.

Both children recently acquired iPhones. Which means if I open this service outside our internal home network, we can schedule calendars no matter where we are, and get up to date changes no matter where we are. That will be extremely convenient.

I somewhat hope that one of my half-dozen readers will find something so laughably wrong with how I configured things that any complaints I make will be rendered moot. I'm not certain, however, that will be the case.

Toolsmiths - since everything is software now anyway... Kebe Says: Dan McD's blog

A recent twitter storm occurred in light of last week's #encryptnews event.

I was rather flattered when well-known whistleblower Thomas Drake retweeted this response of mine:

The mention of "buying usable software" probably makes sense to someone who's used to dealing with Commercial, Off-The-Shelf (COTS) software. We don't live in a world where COTS is necessarily safe anymore. There was a period (which I luckily lived and worked in), where Defense Department ARPA money was being directed specifically to make COTS software more secure and high-assurance. Given the Snowden revelations, however, COTS can possibly be a vulnerability as much as it could be a strength.

In the seminal Frederick Brooks book, The Mythical Man-Month, he describes one approach to software engineering: The Surgical Team. See here and scroll down for a proper description. Note the different roles for such a team.

Given that most media is equivalent to software (easily copied, distributed, etc.), I wonder if media organizations shouldn't adopt certain types of those organizational roles that have been until now the domain of traditional software. In particular, the role of the Toolsmith should be one that modern media organizations adopt. Ignoring traditional functions of "IT", a toolsmith for, say, an investigative organization should be well-versed in what military types like to call Defensive Information Warfare. Beyond just the mere use of encryption (NOTE: ANYONE who equates encryption with security should be shot, or at least distrusted), such Toolsmiths should enable their journalists (who would correspond to the surgeon or the assistant in the surgical team model) to do their job in the face of strong adversaries. An entity that needs a toolsmith will also need a software base, and unless the entity has resources enough to create an entire software stack, that entity will need Free Open-Source Software (for various definitions of Free and Open I won't get into for fear of derailing my point).

I haven't been working in security much since the Solaris Diaspora, so I'm a little out of touch with modern threat environments. I suspect it's everything I'd previous imagined, just more real, and where the word "foreign" can be dropped from "major foreign governments". Anyone who cares about keeping their information and themselves safe should, in my opinion, have at least a toolsmith on their staff. Several organizations do, or at least have technology experts, like the ACLU's Christopher Soghoian, for example. The analogy could probably extend beyond security, but I wanted to at least point out the use of an effective toolsmith.

Happy (early) 20th anniversary, IPv6 Kebe Says: Dan McD's blog

My first full-time job out of school was with the The U.S. Naval Research Laboratory. It was a spectacular opportunity. I was going to be working on next-generation (at the time) Internet Protocol research and development.

When I joined in early 1994, the IPng proposals had been narrowed to three:

  • SIPP - Simple Internet Protocol Plus. 8-byte addresses, combined with a routing header that could, in theory, extend the space even further (inherited from IPng contender PIP).
  • TUBA - TCP Using Big Addresses. The use of OSI's CLNP with proven IPv4 transports TCP and UDP running over it.
  • CATNIP - Common Architecture for the Internet. I never understood this proposal, to be honest, but I believe it was an attempt to merge CLNP and IPv4.

NRL, well, my part of NRL, anyway placed its bet on SIPP. I was hired to help build SIPP for then-nascent 4.4BSD. (The first 10 months were actually on 4.3 Net/2 as shipped by BSDI!) It was a great team to work with, and our 1995 USENIX paper displayed our good work.

Ooops... I'm getting a bit ahead of myself.

The announcement of the IPng winner was to be at the 30th IETF meeting in Toronto, late in July. Some of us were fortunate to find out early that what would become IPv6 was SIPP, but with 16-byte addresses. Since I was building this thing, I figured it was time to get to work before Toronto.

20 years ago today, I sent this (with slightly reordered header fields) mail out to a subset of people. I didn't use the public mailing list, because I couldn't disclose SIPP-16 (which became IPv6) before the Toronto meeting. I also discovered some issues that later implementors would discover, as you can see.


From: "Daniel L. McDonald" &lt;danmcd&gt;
Subject: SIPP-16 stuff
To: danmcd (Daniel L. McDonald), cmetz (Craig Metz), atkinson (Ran Atkinson),
deering@parc.xerox.com, Bob.Hinden@eng.sun.com,
bob.gilligan@eng.sun.com, francis@cactus.ntt.jp,
rxg@thumper.bellcore.com, set@thumper.bellcore.com, bound@zk3.dec.com,
christian.huitema@sophia.inria.fr, conta@lassie.ucx.lkg.dec.com,
grehan@flotsm.ozy.dec.com, nordmark@jurassic-248.Eng.Sun.COM,
bill.simpson@um.cc.umich.edu, rj@sgi.com
Date: Thu, 21 Jul 1994 19:20:33 -0500 (EST)
Cc: vjs@sgi.com
X-Mailer: ELM [version 2.4 PL23]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Message-ID: &lt;9407220020.aa02835@sundance.itd.nrl.navy.mil&gt;
Content-Length: 1578
Status: RO
X-Status:
X-Keywords: NotJunk
X-UID: 155

SIPP folks,

Has anyone tried quick-n-dirty SIPP-16 mods yet?

We have managed to send/receive SIPP-16 pings across both Ethernet and
loopback. UDP was working with SIPP-8, and we're working on it for SIPP-16.
Minor multicast cases were working for SIPP-8 also, and will be moved to
SIPP-16. TCP will be forthcoming once we're comfortable with some of the
protocol control block changes.

My idea for the SIPP-16 sockaddr_sipp and sipp_addr is something like:

struct sipp_addr {
u_long words[4];
};

struct sockaddr_sipp {
u_char ss_len; /* For BSD routing tree code. */
u_char ss_family;
u_short ss_port;
u_long ss_reserved;
struct sipp_addr ss_addr;
};

We've managed to use the above to configure our interfaces, and send raw
SIPP-16 ICMP pings. I've a feeling the routing tree will get hairy with the
new sockaddr_sipp. The size discrepancy between the sockaddr_sipp, and the
conventional sockaddr will cause other compatibility issues to arise.
(E.g. SIOCAIFADDR will not work with SIPP, but SIOCAIFADDR_SIPP will.)

We look forward to the implementors meeting, so we can talk about bloody
gory details, experience with certain internals (PCBs!), and to find out
how far behind we still are.

Dan McD, Craig Metz, & Ran Atkinson
--
Dan McDonald | Mail: {danmcd,mcdonald}@itd.nrl.navy.mil --------------+
Computer Scientist | WWW: http://wintermute.itd.nrl.navy.mil/danmcd.html |
Naval Research Lab | Phone: (202) 404-7122 #include <disclaimer.h> |
Washington, DC | "Rise from the ashes, A blaze of everyday glory" - Rush +

Funny how many defunct-or-at-least-renamed organizations are in that mail (Sun, DEC, Bellcore) are in that mail. BTW, for Solarish systems, the SIOCSLIFADDR (note the 'L') became the ioctl of choice for longer sockaddrs. Also, this was before I discovered uintN_t data types.

If it wasn't clear from the text of the mail, we actually transmitted IPv6 packets across an Ethernet that day. It's possible these were the first IPv6 packets ever sent on a wire. (Other early implementations used IPv6-in-IPv4 exclusively.) I won't fully claim that honor here, but I do believe it could be true.

Home Data Center 2.0 - dogfooding again! Kebe Says: Dan McD's blog

Over six years ago, I put together my first home data center (HDC), which I assembled around a free CPU that was given to me.

A lot has happened in those six years. I've moved house, been through three different employers (and yes, I count Oracle as a different employer, for reasons you can see here), and most relevant to this blog post - technology has improved.

My old home server was an energy pig, loud, and hitting certain limits. The Opteron Model 185 has a TDP of 110 watts, and worse, the original power supply in the original HDC broke, and I replaced it with a LOUD one from a Sun w2100z workstation. I also replaced other parts over the years as things evolved. What I ended up with at the start of 2014 was:

  • AMD Opteron Model 185 - No changes here.
  • Tyan S2866 - Same here, too.
  • 4GB of ECC RAM - Up from 2GB of ECC, to the motherboard's maximum. I tried at first with two additional GB of non-ECC, but one nightly build of illumos-gate where I saw a single-bit error in one built binary was enough to convince me about ECC's fundamental goodness.
  • Two Intel S3500 80GB SATA SSDs - I use these as mirrored root, and mirrored slog, leaving alone ~20GB slices (16 + 4) each. I'm under the assumption that the Intel disk controller will do proper wear-leveling, and what-not. (Any corrections are most appreciated!) These replace two different, lesser-brand 64GB SSDs that crapped out on me.
  • Two Seagate ST2000DL003 2TB SATA drives. - I bought these on clearance a month before the big Thailand flood that disrupted the disk-drive market. At $30/TB, I still haven't found as good of a deal, and the batch on sale were of sufficient quality to not fail me or my mirrored data (so says ZFS, anyway).
  • Lian Li case - I still like the overall mechanical design of this brother-in-law recommended case. I already mentioned the power supply, so I'll skip that here.
  • A cheap nVidia 8400 card - It runs twm on a 1920x1200 display, good enough!
  • OpenIndiana - After moving OpenSolaris from SVR4 to IPS, I used OpenSolaris until Oracle happened. OI was a natural stepping stone off of OpenSolaris.
I gave a talk on how I use my HDC. I'll update that later in this post, but suffice to say, between the energy consumption and the desire for me and my family to enable more services, I figured it was time to upgrade the hardware. With my new job at OmniTI, I also wanted to start dogfooding something I was working with. I couldn't use NexentaStor with my HDC, because of the non-storage functions of Illumos I was using. OmniOS, on the other hand, was going to be a near-ideal candidate to replace OpenIndiana, especially given its server focus.

As before, I started with a CPU for the system. The Socket 1150 Xeon E3 chips, which we had on one server at Nexenta (to help with the Illumos bring up of Intel's I210 and I217 ethernet chip, alongside Joyent and Pluribus), seemed an ideal candidate. Some models had low power draws, and they had all of the features needed to exploit more advanced Illumos features like KVM, if I ever needed it. I also considered the Socket 2011 Xeon E5 chips, but decided that I really didn't need more than 32GB of RAM for the forseeable future. So with that in mind, I asked OmniTI's Supermicro sales rep to put together a box for me. Here's what I got:
  • Intel Xeon E3 1265L v3 - This CPU has a TDP of 45 watts, that's 40% of the TDP of the old CPU. It clocks slightly slower, but otherwise is quite the upgrade with 4 cores, hyperthreading (looking like 8 CPUs to Illumos), and all of the modern bells and whistles like VT-x with EPT and AES-NI. It also is being used in at least one shipping illumos-driven product, which is nice to know.
  • Supermicro X10SLM-LN4F motherboard - This motherboard has four Intel I210 Gigabit ethernet ports on it. I only need two for now, thanks to Crossbow, but I have plans that my paranoia about separate physical LANs may require one or both of those last two. I'm using all four of its 6Gbit SATA ports, and it has two more 3Gbit ones for later. (I'll probably move the SSDs to the 3Gbit ones, because of latency vs. throughput, if I go to a 4-spinning-rust storage setup.) I've disabled USB3 for now, but if/when illumos supports it, I'll be able to test it here.
  • 32 GB of ECC RAM - Maxxed out now. So far, this hasn't been a concern.
  • Same drives as the old one - I moved them right over from the old setup. Installed OmniOS (see below), but basically did "zpool split", "zpool export" from the old server, and "zpool import" on the new one. ZFS again for the win!
  • Supermicro SC732D4 - The case, while not QUITE as cabling-friendly as the old Lian Li, has plastic disk trays that are an improvement over just screwing them in place on the Lian Li. The case comes standard with a four-disk 3.5" cage, and I added a four-disk 2.5" cage to mine. The 500W power supply seems to be an energy improvement, and is DEFINITELY quieter.
  • OmniOS r151010 - For my home server use, I'm going to be using the stable OmniOS release, which as of very recently became r151010. Every six months, therefore, I'll be getting a new OmniOS to use on this server. I haven't tried installing X or twm just yet, but that, and possibly printer support for my USB color printer, are the only things lacking over my old OI install.
I've had this hardware running for about two weeks now. It does everything the old server did, and a few new things.
  • File Service - NFS, and as of very recently, CIFS as well. The latter is entirely to enable scan-to-network-disk scanning. This happens in the global zone, on the "internal network" NIC.
  • Router - This is a dedicated zone which serves as the default router and NAT box. It also redirects external web and Minecraft requests (see below) to their respective zones. It also serves as an IPsec-protected remote access point. Ex-Sun people will know exactly what I'm talking about. It uses an internal vNIC, and a dedicated external NIC.
  • Webserver - As advertised. Right now it just serves static content on port 80 (www.kebe.com), but I may expand this, if I don't put HTTPS service in another zone later. This sits on an internal vNIC, and its inbound traffic is directed by the NAT/Router.
  • Minecraft - My children discovered Minecraft in the past year or so. Turns out, Illumos does a good job of serving Minecraft. With this new server, and running the processes as 32-bit ones (implicit 4Gig limit), I can host two Minecraft servers easily now. This sits on an internal vNIC as well.
  • Work - For now, this is just a place for me to store files for my job and build things. Soon, I plan on using another IPsec tunnel in the Router zone, an etherstub, and making this a part of my office, sitting in my house. Once that happens, I'll be using a dedicated NIC (for separation) to plug my work-issued laptop into.
  • Remote printing - I have a USB color printer that the global zone can share (via lpd). To be honest, I don't have this working on OmniOS just yet, but I'll get that back.
  • DHCP and DNS - Some people assume these are part of a router, but that's not necessarily the case. In this new instantiation, they'll live in the same zone as the webserver (which has a default route installed but is NOT the router). For this new OmniOS install, I'm switching to the ISC DHCP daemon. I hope to upstream it to omnios-build after some operational experience.
Not quite two weeks now, and so far, so good. My kids haven't noticed any lags in Minecraft, and I've built illumos-gate from scratch, both DEBUG and non-DEBUG, in less than 90 minutes. We'll see how DHCP holds up when Homeschool Book Club shows up with Moms carrying smartphones, tablets, and laptops, plus even a kid or two bringing a Minecraft-playing laptop as well for after the discussion.

It's just me, I think, but return()s in loops can be bad. Kebe Says: Dan McD's blog

I was reviewing some code tonight. It was a simple linked-list match which originally looked like:


obj_t *
lookup(match_t key)
{
obj_t *p;

for (p = list_head(); p; p = list_next()) {
if (p->val == key)
return (p);
}

return (NULL);
}

Not bad. But it turns out the list in question needed mutually exclusive access, so the reviewee inserted the mutex into this code.


obj_t *
lookup(match_t key)
{
obj_t *p;

mutex_enter(list_lock());
for (p = list_head(); p; p = list_next()) {
if (p->val == key) {
mutex_exit(list_lock());
return (p);
}
}

mutex_exit(list_lock());
return (NULL);
}

Eeesh, two places to call mutex_exit(). I suppose a good compiler would recognize the common basic blocks and optimize them out, but that's still mildly ugly to look at. Still, that above code just rubbed me the wrong way, even though I KNOW there are other bits of Illumos that are like the above. I didn't block the reviewer, but I did write down what I thought it should look like:


obj_t *
lookup(match_t key)
{
obj_t *p;

mutex_enter(list_lock());

p = list_head();
while ( p != NULL && p->val != key)
p = list_next();

mutex_exit(list_lock());
return (p);
}

That seems simpler. The operation is encapsulated in the mutex_{enter,exit} section, and there are no escape hatches save those in the while boolean. (It's the always-drop-the-mutex-upon-return that makes language constructs like monitors look appealing.)

I think I'm probably making a bigger deal out of this than I should, but the last code looks more readable to me.

One thing the reviewee suggested to me was that a for loop like before, but with breaks, would be equally clean w.r.t. only having one place to drop the mutex. I think the reviewee is right, and it allows for more sophisticated exits from a loop.

Some people would even use "goto fail" here, but we know what can happen when that goes wrong. :)

Greetings from OmniTI Kebe Says: Dan McD's blog

Hello again, world!

OmniTI gave me an opportunity to get back into the networking stack, while still having the ability to stay a jack-of-all-trades at least some of the time. It was a hard decision to make, but as of this past Monday, I'm now at OmniTI. My first week I'm down here in Maryland at HQ, but I'll be working from my house primarily. I hope also with this new job to appear at conferences a bit more, and meet more illumos users and developers in person, especially OmniOS ones.

What I learned from my Atari 8-bit days Kebe Says: Dan McD's blog

Another embedded tweet!

Happy Throwback Thursday! Some time ago, also on Throwback Thursday, I tweeted a link to a document I wish I had when I was much younger:

I wish I'd had it, because it may have helped me save my first 8-bit Atari computer (an 800XL) from having its POKEY chip fried by a dumb copy-ROM-into-RAM loop. Beyond learning not to blindly write into hardware registers, my Atari 8-bits ended up teaching me a surprising amount. A fair amount of what I learned helped me mature into a proper Computer Scientist and Software Engineer.

Be Careful of the Next Version

I generally look forward to upgrades. Bugs get fixed, features get added, things move faster, and if you're really lucky, you get more than one of those with one upgrade. It doesn't always turn out nicely, though. Sometimes, the next version changes things enough where things that once worked no longer do. Other times, the next version just plain sucks.

8-bit Atari owners had two serious negative encounters - one of each kind. The unexpected change was the transition from the original 400 & 800 models to the XL (and later XE) series. The reason this was a problem is actually best described later.

Atari's DOS (almost every 8-bit machine's disk drivers were called "DOS") lingered on version 2.0 from 1980 until 1984. To accompany new "enhanced density" 5.25" floppy drives, Atari released DOS 3. DOS 3 falls squarely into the, "just plain sucks," category. It was a poor design, including such misfeatures as:

  • Larger block sizes (2048 bytes vs. 128 bytes), which lead to wasted disk space and sometimes less overall capacity if anything barely spilled into the next block
  • One-way migration. Once your data moved to DOS 3, it wasn't going back.
  • An overbearing help system that took up disk space (already at a premium).
I didn't know what it was called at the time, but DOS 3 suffered from the Second-System Effect. Luckily, Atari ended up offering DOS 2.5, which looked like DOS 2.0, save for both support for enhance-density floppies, AND the ability to migrate DOS 3 files back to DOS 2.x.

Declare Your String Sizes

Jumping from Pascal or even BASIC to a language like C could be confusing to some. "What do you mean strings are just a character array?" If you cut your teeth on Atari BASIC, you already had an inkling of what was going on.

The classic Microsoft BASIC took up more than the 8K bytes that 8-bit Ataris had reserved for the cartridge slot. The resulting shrinkage of Atari BASIC included the array-like requirements for strings. On classic Microsoft BASIC:


10 A$="HELLO, WORLD"
20 PRINT "THE TEST STRING IS: ", A$
But you had to declare the string size in Atari BASIC:

5 DIM A$(100)
10 A$="HELLO, WORLD"
20 PRINT "THE TEST STRING IS: ", A$
One could not have an array of strings in Atari BASIC, and some of the classic BASIC array operators took on new significance in Atari BASIC. See here for a treatise on the subject.

Don't Depend on Implementation Details

I mentioned the transition from the 400 and 800 to the XL series. Several pieces of software broke when they loaded onto an XL. The biggest reason for this was because these programs, to save cycles, would jump directly into various ROM routines that were supposed to be accessed through a documented table of JMP instructions. To save the three cycles of an additional JMP, programs would often inline the table entries into their programs. The XL series included a rewritten ROM, which scrambled a large portion of where these routines were implemented. BOOM, no more working code.

Atari, to their credit, released a "Translator" boot disk, which loaded a variant of the old 800 ROM into the XL's extended, bank-switched, RAM, and ran the system using the old 800 ROM. This allowed the broken software to continue to work.

You WILL Have Rejected Submissions

Owning an 8-bit Atari meant you subscribed to at least one of Antic or ANALOG. I was an ANTIC subscriber until I graduated high school. I even tried to submit, twice, type-in programs with accompanying articles to ANTIC. Both were terrible, and rightly rejected by the editor. I'm honestly afraid to remember what they were.

And William Gibson's a Pretty Good Writer

Speaking of Antic , check out this article from September, 1985, especially Part 3 of the article. I immediately scoured the Waukesha County Library System trying to find Neuromancer, and wasn't disappointed... not at all. 16-year-old me really liked this book, and wouldn't have discovered it before college were it not for ANTIC, which I'd have not read without my 8-bit Atari.

Goodbye blogspot Kebe Says: a blog by Dan McD.

First off, long time no blog!

This is the last post I'm putting on the Blogspot site. In the spirit of eating my own dogfood, I've now set up a self-hosted blog on my HDC. I'm sure it won't be hard for all half-dozen of you readers to move over. I'll have new content over there, at the very least the Hello, World post, a catchup post, and a HDC 3.0 post to match the ones for 1.0 and 2.0.

Solaris 10 zones on Tribblix The Trouble with Tribbles...

One of the interesting capabilities of Solaris zones was the ability to run older versions of Solaris than that in the global zone. Marketing managed to mangle this into Containers, and it was supported for Solaris 8 and Solaris 9.

I used this extensively on one project, to lift a whole datacenter of ancient (yes, really ancient) Sun servers into zones on a couple of T5240s. Worked great. (We had to get an E450 out of the dumpster and build it specially to get a Solaris 2.6 system, however.)

Solaris 11 and illumos have dropped the Solaris 8 and 9 legacy containers, but have a Solaris 10 zone brand. On Tribblix, this can be installed with

zap install TRIBsys-zones-brand-s10

(If you're on an IPS based distro, the package name is system/zones/brand/s10.)

Installing an s10 branded zone is just like a regular zone, but you need a Solaris 10 image to install from. You could tar up a legacy system, or create a new image from the install media.

There are certain requirements for the software in the image and on the host. If, on a Solaris 10 system, you look in the directory /usr/lib/brand/solaris10, you might see a couple of files called 0 and 1. They have a little bit of text in them for explanation, but these are emulation compatibility feature flags. If you look at the illumos source, you can see them listed too. This is a basic versioning system - the host running the global zone needs to support all the features of the software in the zone. Fortunately that feature list hasn't changed, so we're good.

For the image, the s10 brand checks the SUNWcakr package and needs it to be patched to a minimum level. In practice, this means that anything S10U8 or newer will work.

During zone installation, there's some sanity checking. It turns out the installer is looking for /var/sadm/system/admin/INST_RELEASE and gives up if it can't find it. I had to manually create that file:

cat > ......./var/sadm/system/admin/INST_RELEASE
OS=Solaris
VERSION=10
REV=0
^D

There's also some processing of the zone image that gets done as part of the zone installation. And if that processing fails, then the zone install will fail too.

It has a hardcoded list of safe_dir entries for things it needs to replace. If you don't have those, it simply fails, so you need to add a bunch of packages just to make it happy. Specifically: autofs, zfs, and ipf.

The postprocessing also runs sys-unconfig, so you need to make sure that's present, from SUNWadmap.

Enough chatter. If I have my Solaris 10 media under /mnt1, and want to create an image under /export/S10, then it's going to look like this:

cd /mnt1/Solaris_10/Product
# SUNWCcs SUNWcar SUNWcakr SUNWkvm
pkgadd -d . -R /export/S10 SUNWcsr
pkgadd -d . -R /export/S10 SUNWckr
pkgadd -d . -R /export/S10 SUNWcnetr
pkgadd -d . -R /export/S10 SUNWcsd
pkgadd -d . -R /export/S10 SUNWcsl
pkgadd -d . -R /export/S10 SUNWcsu
pkgadd -d . -R /export/S10 SUNWcar.i
pkgadd -d . -R /export/S10 SUNWcakr.i
pkgadd -d . -R /export/S10 SUNWkvm.i
pkgadd -d . -R /export/S10 SUNWcslr
# SUNWCfmd
pkgadd -d . -R /export/S10 SUNWfmdr
pkgadd -d . -R /export/S10 SUNWfmd
# SUNWClexpt
pkgadd -d . -R /export/S10 SUNWlexpt
# SUNWCpicl
pkgadd -d . -R /export/S10 SUNWpiclr
pkgadd -d . -R /export/S10 SUNWpiclu
# SUNWCopenssl SUNWhea (but not man)
pkgadd -d . -R /export/S10 SUNWopensslr
pkgadd -d . -R /export/S10 SUNWopenssl-libraries
pkgadd -d . -R /export/S10 SUNWhea
pkgadd -d . -R /export/S10 SUNWopenssl-include
pkgadd -d . -R /export/S10 SUNWopenssl-commands
# SUNWCpkgcmds SUNWwbsup
pkgadd -d . -R /export/S10 SUNWproduct-registry-root
pkgadd -d . -R /export/S10 SUNWwsr2
pkgadd -d . -R /export/S10 SUNWpkgcmdsr
pkgadd -d . -R /export/S10 SUNWwbsup
pkgadd -d . -R /export/S10 SUNWpkgcmdsu
# SUNWCpr SUNWCtls
pkgadd -d . -R /export/S10 SUNWpr
pkgadd -d . -R /export/S10 SUNWtls
pkgadd -d . -R /export/S10 SUNWjss
# SUNWCfwshl
pkgadd -d . -R /export/S10 SUNWbash
pkgadd -d . -R /export/S10 SUNWtcsh
pkgadd -d . -R /export/S10 SUNWzsh
# perl
pkgadd -d . -R /export/S10 SUNWperl584core
pkgadd -d . -R /export/S10 SUNWperl584usr
# SUNWCptoo SUNWtecla SUNWesu SUNWtoo
pkgadd -d . -R /export/S10 SUNWtecla
pkgadd -d . -R /export/S10 SUNWbtool
pkgadd -d . -R /export/S10 SUNWesu
pkgadd -d . -R /export/S10 SUNWcpp
pkgadd -d . -R /export/S10 SUNWtoo
pkgadd -d . -R /export/S10 SUNWlibmr
pkgadd -d . -R /export/S10 SUNWlibm
pkgadd -d . -R /export/S10 SUNWlibmsr
pkgadd -d . -R /export/S10 SUNWlibms
pkgadd -d . -R /export/S10 SUNWsprot
# SUNWCfwcmp SUNWlibC
pkgadd -d . -R /export/S10 SUNWlibC
pkgadd -d . -R /export/S10 SUNWbzip
pkgadd -d . -R /export/S10 SUNWgzip
pkgadd -d . -R /export/S10 SUNWzip
pkgadd -d . -R /export/S10 SUNWzlib
# release and sys-unconfig
pkgadd -d . -R /export/S10 SUNWsolnm
pkgadd -d . -R /export/S10 SUNWadmr
pkgadd -d . -R /export/S10 SUNWadmlib-sysid
pkgadd -d . -R /export/S10 SUNWadmap
# autofs is needed for validation
pkgadd -d . -R /export/S10 SUNWatfsr
pkgadd -d . -R /export/S10 SUNWatfsu
# ditto zfs
pkgadd -d . -R /export/S10 SUNWlxml
pkgadd -d . -R /export/S10 SUNWsmapi
pkgadd -d . -R /export/S10 SUNWzfskr
pkgadd -d . -R /export/S10 SUNWzfsr
pkgadd -d . -R /export/S10 SUNWzfsu
# ditto ipf
pkgadd -d . -R /export/S10 SUNWipfr
pkgadd -d . -R /export/S10 SUNWipfu
# It's about 235M at this point

If you cd to /export/S10, make sure the INST_RELEASE file is there with the correct contents (see above) and then tar up what you have, you can feed that tarball to the zone installation and it should work.

If you look at documentation for s10 zones on Solaris 11, you'll see a -c option. We don't have that, but you could drop a sysidcfg file into /etc/sysidcfg in the zone so it will configure itself at boot. It will look something like this:

system_locale=C
terminal=xterm
# shared
network_interface=primary {
    hostname=${ZNAME}
}
# exclusive
network_interface=primary {
        hostname=${ZNAME}
    ip_address=192.168.0.101
        netmask=255.255.255.0
        default_route=NONE
        protocol_ipv6=no
}
security_policy=NONE
name_service=DNS {
    domain_name=special.example.com
    name_server=192.168.112.3,1.2.3.4
}
nfs4_domain=dynamic
timezone=${TZ:=UTC}
root_password=jhDYXBAnWJKr2

If you're using Tribblix, most of the zone creation is simplified, and it will be:

zap create-zone -t s10 -z s10-test4 -I /tmp/S10.tar -i 192.168.0.224

I haven't tried this on SPARC (my use case is building Java and Node.JS), but it ought to be exactly the same modulo trivial chagnes to package names.

Docker, Go and USDT Staring at the C

We have what should be a simple task: we’re on CentOS 7, and we want to deploy a Go binary that will have USDT tracepoints. USDT is an attractive option for a few debugging purposes. It allows applications to define tracepoints with higher levels of stability and semantic meaning than more ad-hoc methods like dynamic uprobes.

Usage of USDT tracepoints tends to have a different focus from other monitoring techniques like logging, Prometheus, OpenTracing etc. These might identify a general issue such as a poor latency metric: you’d then use USDT probes to dig further into the problems in a production system, to identify precisely what’s happening at a particular endpoint or whatever.

USDT in Go

The normal model for USDT involves placing the trace points at specific places in the binary: they are statically defined and built, but dynamically enabled. This is typically done via the DTRACE_PROBE() family of macros.

The only (?) USDT facility for Go is salp. This uses libstapsdt under the hood. This library dynamically creates probes at runtime, even though Go is a compiled language. Yes, this is dynamic static dynamic tracing.

We’re going to use salpdemo in our experiment. This has two USDT probes, p1 and p2 that we’d like to be able to dynamically trace, using bcc-tools’ handy trace wrapper. CentOS 7 doesn’t appear to have support for the later USDT support in perf probe.

Setting up a Docker container for dynamic tracing

For a few different reasons, we’d like to be able to trace from inside the container itself. This has security implications, given what’s implemented today, but bear in mind we’re on CentOS 7, so even if there’s a finer-grained current solution, there’s a good chance it wouldn’t work here. In reality, we would probably use an ad-hoc debugging sidecar container, but we’re going to just use the one container here.

First, we’re going to deploy the container with ansible for convenience:

    $ cat hosts
localhost ansible_connection=local
$ cat playbook.yml
---

- hosts: localhost
  become: yes
  tasks:
    - docker_container:
        name: usdt_test
        image: centos:7
        state: started
        command: sleep infinity
        network_mode: bridge
        ulimits:
          - memlock:8192000:8192000
        capabilities:
          - sys_admin
        volumes:
          - /sys/kernel/debug:/sys/kernel/debug
$ ansible-playbook -i hosts ./playbook.yml

Note that we’re using sleep infinity here to keep our container running so we can play around.

We need the sys_admin capability to be able to program the probes, and the BPF compiler needs the locked memory limit bumping. We also need to mount /sys/kernel/debug read-write (!) in order to be able to write to /sys/kernel/debug/tracing/uprobe_events.

Now let’s install everything we need to be able to trace these probes:

    $ docker exec -it usdt_test yum -y install \
    kernel-devel-$(uname -r) kernel-$(uname -r) bcc-tools

Yes, it’s a lot, but unavoidable. You can, in theory, use mounted volumes for the kernel sources, as described here; however, the read-only mounts break packaging inside the container, so we’re not doing that here.

Tracing the probes in the container

The above was a big hammer, but we should be good to go right? Let’s start up the demo binary:

    $ docker cp ~/salpdemo usdt_test:/root/
$ docker exec -it usdt_test bash
[root@8ccf34663dd2 /]# ~/salpdemo &
[1] 18166
 List the go probes in this demo with
        sudo tplist -vp "$(pgrep salpdemo)" "salp-demo*"
Trace this process with
        sudo trace -p "$(pgrep salpdemo | head -n1)" 'u::p1 "i=%d err=`%s` date=`%s`", arg1, arg2, arg3' 'u::p2 "j=%d flag=%d", arg1, arg2'
        or
        sudo trace -p "$(pgrep salpdemo | head -n1)" 'u::p1 (arg1 % 2 == 0) "i=%d err='%s'", arg1, arg2'

We can indeed list the probes:

    [root@8ccf34663dd2 /]# /usr/share/bcc/tools/tplist -vp $(pgrep salpdemo) | head
salp-demo:p1 [sema 0x0]
  1 location(s)
  3 argument(s)
salp-demo:p2 [sema 0x0]
  1 location(s)
  2 argument(s)
libc:setjmp [sema 0x0]
...

So let’s try the suggested trace invocation:

    # /usr/share/bcc/tools/trace -p "$(pgrep salpdemo | head -n1)" 'u::p1 (arg1 % 2 == 0) "i=%d err='%s'", arg1, arg2'

perf_event_open(/sys/kernel/debug/tracing/events/uprobes/p__tmp_salp_demo_I8qitQ_so_0x270_18166_bcc_18175/id): Invalid argument
Failed to attach BPF to uprobe

Huh. This doesn’t seem to be a permissions issue, since we got EINVAL. In addition, running from the host has the same problem.

I haven’t proved it, but I think our basic issue here is that Centos 7 is missing this kernel fix:

tracing/uprobe: Add support for overlayfs

I spent way too long trying to work around this by placing the binary somewhere other than overlayfs, before I finally dug a little bit more into how libstapsdt actually works, and figured out the problem.

Working around overlayfs and libstapsdt

To build probes dynamically at runtime, libstapsdt does something slightly crazy: it generates a temporay ELF shared library at runtime that contains the USDT probes and uses dlopen() to bring it into the running binary. Let’s have a look:

    [root@8ccf34663dd2 /]# grep salp-demo /proc/$(pgrep salpdemo)/maps
7fa9373b5000-7fa9373b6000 r-xp 00000000 fd:10 1506373                    /tmp/salp-demo-I8qitQ.so
7fa9373b6000-7fa9375b5000 ---p 00001000 fd:10 1506373                    /tmp/salp-demo-I8qitQ.so
7fa9375b5000-7fa9375b6000 rwxp 00000000 fd:10 1506373                    /tmp/salp-demo-I8qitQ.so

The process has mapped in this temporary file, named after the provider. It’s on /tmp, hence overlay2 filesystem, explaining why moving the salpdemo binary itself around made no difference.

So maybe we can be more specific?

    [root@8ccf34663dd2 /]# /usr/share/bcc/tools/trace -p "$(pgrep salpdemo | head -n1)" 'u:/tmp/salp-demo-I8qitQ.so:p1 (arg1 % 2 == 0) "i=%d err='%s'", arg1, arg2'
perf_event_open(/sys/kernel/debug/tracing/events/uprobes/p__tmp_salp_demo_I8qitQ_so_0x270_18166_bcc_18188/id): Invalid argument
Failed to attach BPF to uprobe

Still not there yet. The above bug means that it still can’t find the uprobe given the binary image path. What we really need is the host path of this file. We can get this from Docker:

    $ docker inspect usdt_test | json -a GraphDriver.Data.MergedDir
/data/docker/overlay2/77c1397db72a7f3c7ba3f8af6c5b3824dc9c2ace9432be0b0431a2032ea93bce/merged

This is not good, as obviously we can’t reach this path from inside the container. Hey, at least we can run it on the host though.

    $ sudo /usr/share/bcc/tools/trace 'u:/data/docker/overlay2/77c1397db72a7f3c7ba3f8af6c5b3824dc9c2ace9432be0b0431a2032ea93bce/merged/tmp/salp-demo-I8qitQ.so:p1 (arg1 % 2 == 0) "i=%d err='%s'", arg1, arg2'
Event name (p__data_docker_overlay2_77c1397db72a7f3c7ba3f8af6c5b3824dc9c2ace9432be0b0431a2032ea93bce_merged_tmp_salp_demo_I8qitQ_so_0x270) is too long for buffer
Failed to attach BPF to uprobe

SIGH. Luckily, though:

    $ sudo /usr/share/bcc/tools/trace 'u:/data/docker/overlay2/77c1397db72a7f3c7ba3f8af6c5b3824dc9c2ace9432be0b0431a2032ea93bce/diff/tmp/salp-demo-I8qitQ.so:p1 (arg1 % 2 == 0) "i=%d err='%s'", arg1, arg2'
PID     TID     COMM            FUNC             -
19862   19864   salpdemo        p1               i=64 err=An error: 64
19862   19864   salpdemo        p1               i=66 err=An error: 66

It worked! But it’s not so great: we wanted to be able to trace inside a container. If we mounted /data/docker itself inside the container, we could do that, but it’s still incredibly awkward.

Using tmpfs?

Instead, can we get the generated file onto a different filesystem type? libstapsdt hard-codes /tmp which limits our options.

Let’s start again with /tmp inside the container on tmpfs:

    $ tail -1 playbook.yml
        tmpfs: /tmp:exec

We need to force on exec mount flag here: otherwise, we can’t dlopen() the generated file. Yes, not great for security again.

    $ docker exec -it usdt_test bash
# ~/salpdemo &
...
[root@1f56af6e7bee /]# /usr/share/bcc/tools/trace -p "$(pgrep salpdemo | head -n1)" 'u::p1 "i=%d err=`%s` date=`%s`", arg1, arg2, arg3' 'u::p2 "j=%d flag=%d", arg1, arg2'
PID     TID     COMM            FUNC             -

Well, we’re sort of there. It started up, but we never get any output. Worse, we get the same if we try this in the host now! I don’t know what the issue here is.

Using a volume?

Let’s try a volume mount instead:

    $ tail -3 playbook.yml
        volumes:
          - /sys/kernel/debug:/sys/kernel/debug
          - /tmp/tmp.usdt_test:/tmp

If we run trace in the host now, we can just use u::p1:

    $ sudo /usr/share/bcc/tools/trace -p "$(pgrep salpdemo | head -n1)" 'u::p1 "i=%d err=`%s` date=`%s`", arg1, arg2, arg3' 'u::p2 "j=%d flag=%d", arg1, arg2'
PID     TID     COMM            FUNC             -
6864    6866    salpdemo        p2               j=120 flag=1
...

But we still need a bit of a tweak inside our container:

    # /usr/share/bcc/tools/trace -p "$(pgrep salpdemo | head -n1)" 'u::p1 "i=%d err=`%s` date=`%s`", arg1, arg2, arg3'
PID     TID     COMM            FUNC             -
<no output>
    [root@d72b822cab0f /]# cat /proc/$(pgrep salpdemo | head -n1)/maps | grep /tmp/salp-demo*.so | awk '{print $6}' | head -n1
/tmp/salp-demo-6kcugm.so
[root@d72b822cab0f /]# /usr/share/bcc/tools/trace -p  "$(pgrep salpdemo | head -n1)" 'u:/tmp/salp-demo-6kcugm.so:p1 "i=%d err=`%s` date=`%s`", arg1, arg2, arg3'
PID     TID     COMM            FUNC             -
11593   11595   salpdemo        p1               i=-17 err=`An error: -17` date=`Thu, 06 Aug 2020 13:12:57 +0000`
...

I don’t have any clear idea why the name is required inside the container context, but at least, finally, we managed to trace those USDT probes!

Running a Zabbix server in an OmniOS zone OmniOS Community Edition

This guide shows how to get Zabbix up and running within a zone on an OmniOS system. Zabbix is an open-source monitoring system and is available from the OmniOS Extra repository.

Zone setup

I’m going to use the lightweight sparse zone brand for this so start by making sure that it is installed:

        % pfexec pkg install brand/sparse
No updates necessary for this image.

If the brand is not already installed, then there will be more output from the above command.

Create a new sparse zone called zabbix. My preference is to configure the IP stack within the zone configuration which results in it being automatically applied to the zone when it’s booted, and enables additional protection against settings such as the IP address from being changed from within the zone. Note that, like all zones, the zone path must be a direct descendant of a ZFS dataset. On my system, /zones is the mount point for such a dataset.

        % pfexec zonecfg -z zabbix
zabbix: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:zabbix> create -t sparse
zonecfg:zabbix> set zonepath=/zones/zabbix
zonecfg:zabbix> add net
zonecfg:zabbix:net> set physical=zabbix0
zonecfg:zabbix:net> set global-nic=igb0
zonecfg:zabbix:net> set allowed-address=172.27.10.35/24
zonecfg:zabbix:net> set defrouter=172.27.10.254
zonecfg:zabbix:net> end
zonecfg:zabbix> add attr
zonecfg:zabbix:attr> set name=resolvers; set type=string; set value=1.1.1.1
zonecfg:zabbix:attr> end
zonecfg:zabbix> add attr
zonecfg:zabbix:attr> set name=dns-domain; set type=string; set value=omnios.org
zonecfg:zabbix:attr> end
zonecfg:zabbix:attr> verify; commit; exit

By default, the zone’s boot environment will encompass all of the files and directories within. For an application such as Zabbix, it’s important to create a dedicated area to hold files such as the underlying database which should be consistent across different boot environments.

My system has a ZFS pool called data so I’m going to create a new dataset under that and delegate it to the zone. I’m also going to change the mount point for the dataset to /data so that it appears there within the zone.

        % pfexec zfs create data/zabbix
% pfexec zonecfg -z zabbix 'add dataset; set name=data/zabbix; end'

% pfexec zfs umount data/zabbix
% pfexec zfs set mountpoint=/data data/zabbix
% pfexec zfs set zoned=on data/zabbix

Now it’s time to install the zone. Being a sparse zone, this will be pretty quick - only around 5MiB of files are actually installed.

        % pfexec zoneadm -z zabbix install
A ZFS file system has been created for this zone.

       Image: Preparing at /zones/zabbix/root.
Sanity Check: Looking for 'entire' incorporation.
   Publisher: Using omnios (https://pkg.omnios.org/r151034/core/).
   Publisher: Using extra.omnios (https://pkg.omnios.org/r151034/extra/).
       Cache: Using /var/pkg/publisher.
  Installing: Packages (output follows)
Packages to install: 203
Mediators to change:   5
 Services to change:   6

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                            203/203     1485/1485      4.9/4.9      --

PHASE                                          ITEMS
Installing new actions                     5927/5927
Updating package state database                 Done
Updating package cache                           0/0
Updating image state                            Done
Creating fast lookup database                   Done
 Postinstall: Copying SMF seed repository ... done.
        Done: Installation completed in 16.942 seconds.

Let’s boot the zone and log in:

        % pfexec zoneadm -z zabbix boot
% pfexec zlogin zabbix
[Connected to zone 'zabbix' pts/17]
OmniOS 5.11 omnios-r151034-831ff8e83b   July 2020
root@zabbix#

Since this is the first boot, it will take a minute for all of the service manifests to be imported. Watch the output of svcs -x until nothing is returned:

        root@zabbix# svcs -x
root@zabbix#

Check Internet connectivity and DNS:

        root@zabbix# ping 1.1.1.1
1.1.1.1 is alive
root@zabbix# ping google.com
google.com is alive

and check the delegated dataset:

        root@zabbix# df -h /data
Filesystem             Size   Used  Available Capacity  Mounted on
data/zabbix           3.51T    42K      1.57T     1%    /data

Install the zabbix server package, which will automatically install the correct version of dependencies:

        root@zabbix# pkg install zabbix-server
           Packages to install:  6
           Mediators to change:  1
            Services to change:  4
       Create boot environment: No
Create backup boot environment: No

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                                6/6     3082/3082    22.7/22.7  1.2M/s

PHASE                                          ITEMS
Installing new actions                     4013/4013
Updating package state database                 Done
Updating package cache                           0/0
Updating image state                            Done
Creating fast lookup database                   Done
Updating package cache                           2/2

Database

The OmniOS Zabbix package needs a database for storage. The database can be either Postgres or MariaDB, both of which are available as packages from the OmniOS Extra repository. The default supported database is Postgres and that is the database that will be used in the rest of this walk through. To switch zabbix over to using MariaDB, it is necessary to change the zabbix mediator:

        root@zabbix# pkg set-mediator -I mariadb zabbix

Whichever database is used, the underlying database files are one of the things that should be stored on the dedicated dataset that was delegated to the zone.

Create a new ZFS filesystem for the database. For a Postgres database, it’s recommended to set the filesystem recordsize to 8K, and to set the log bias mode to throughput, as shown here. Also for security, executable, setuid and device files are explicitly disabled on the filesystem.

        root@zabbix# zfs create data/zabbix/db
root@zabbix# zfs set recordsize=8k data/zabbix/db
root@zabbix# zfs set logbias=throughput data/zabbix/db
root@zabbix# zfs set exec=off data/zabbix/db
root@zabbix# zfs set devices=off data/zabbix/db
root@zabbix# zfs set setuid=off data/zabbix/db

This new dataset inherits the mountpoint from the filesystem:

        root@zabbix# df -h | grep data/zab
data/zabbix           3.51T    42K      1.57T     1%    /data
data/zabbix/db        3.51T    42K      1.57T     1%    /data/db

Set up the initial database in the dedicated ZFS dataset:

        root@zabbix# chown postgres /data/db
root@zabbix# chmod 0700 /data/db
root@zabbix# svccfg -s postgresql12:default \
        setprop application/datadir = /data/db
root@zabbix# svcadm refresh postgresql12:default
root@zabbix# cd /data/db
root@zabbix:/data/db# sudo -u postgres /opt/ooce/pgsql-12/bin/initdb -D .
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".

Data page checksums are disabled.
...
Success. You can now start the database server using:

Start the database service using svcadm:

        root@zabbix# cd
root@zabbix# svcadm enable postgresql12
root@zabbix# svcs postgresql12
STATE          STIME    FMRI
online         12:32:12 svc:/ooce/database/postgresql12:default

Create the zabbix database user - enter a password to secure the account when prompted.

        root@zabbix# sudo -u postgres createuser --pwprompt zabbix
Enter password for new role:
Enter it again:

Create the database and import initial data:

        root@zabbix# sudo -u postgres createdb -O zabbix \
        -E Unicode -T template0 zabbix
root@zabbix# cd /opt/ooce/zabbix/sql/pgsql
root@zabbix:/opt/ooce/zabbix/sql# cat schema.sql images.sql data.sql \
        | sudo -u zabbix psql zabbix
... lots of output, not shown here ...
COMMIT

Web interface

Zabbix comes with a web interface written in PHP. I’m going to use the nginx web server to serve this over HTTP.

Install packages:

        root@zabbix# pkg install nginx php-74
...
DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                              10/10       517/517    22.9/22.9  2.2M/s
...

Edit the /etc/opt/ooce/nginx/nginx.conf file and replace the example server block in there with the following:

            server {
        listen       80;
        server_name  localhost;
        root /opt/ooce/zabbix/ui;
        index index.php;
        location ~ \.php$ {
                try_files $uri =404;
                fastcgi_pass unix:/var/opt/ooce/php/run/www-7.4.sock;
                fastcgi_index index.php;
                include fastcgi.conf;
        }
    }

A few PHP settings need to be tweaked for proper Zabbix operation. This example sets the time zone to UTC but you can set it to local time if you prefer.

        root@zabbix# cd /etc/opt/ooce/php-7.4/
root@zabbix# sed -i '/post_max_size/s/=.*/= 16M/' php.ini
root@zabbix# sed -i '/execution_time/s/=.*/= 300/' php.ini
root@zabbix# sed -i '/input_time/s/=.*/= 300/' php.ini
root@zabbix# sed -i '/date.timezone/s/.*/date.timezone = UTC/' php.ini

Grant PHP permissions to manage the zabbix UI configuration file:

        root@zabbix# chown php /opt/ooce/zabbix/ui/conf

Enable PHP and the web server:

        root@zabbix# svcadm enable nginx php74
root@zabbix# svcs nginx php74
STATE          STIME    FMRI
online         12:39:30 svc:/network/http:nginx
online         12:39:30 svc:/application/php74:default

Start the Zabbix services:

        root@zabbix# svcadm enable zabbix:server zabbix:agent
root@zabbix# svcs zabbix
STATE          STIME    FMRI
online         12:40:00 svc:/network/zabbix:server
online         12:40:08 svc:/network/zabbix:agent

You should now be able to point a web browser at the server and go through the initial Zabbix setup process:

Zabbix installer

On the database screen, set the type to Postgres via localhost. Enter the password that you set earlier during database creation:

Zabbix database

Once you get back to the login screen, enter Admin with a password of zabbix to get started:

Zabbix login


Any problems or questions, please get in touch.

How to determine PXE mac address when booting illumos via PXELinux/iPXE Minimal Solaris

In illumos, if you need to determine the interface which was used for booting via PXE then it's possible to use "boot-mac" property:

# /sbin/devprop -s boot-mac 
c:c4:7a:04:ef:2c
But this property is set by illumos pxeboot. On some setup we use PXELinux to boot multiple illumos clients over PXE. For any illumos distribution "append" line in pxelinux.cfg looks like:
label omni PXE
kernel mboot.c32
append omni7/platform/i86pc/kernel/amd64/unix -B install_media=http://192.168.1.1/kayak.zfs.xz,install_config=http://192.168.1.1 ---omni7/miniroot
If you have small amount of clients, then it's possible to just add each client's mac address to the kernel line with -B boot-mac=<hardware-address>, but it doesn't work in case you have a hundreds of clients. 

Pxelinux menu has "ipappend 2" option, which appends "BOOTIF=<hardware-address-of-boot-interface>" to the kernel command line, but pxelinux puts BOOTIF exactly at the end of "append" line, after boot_archive, and kernel does not recognise this variable after boot. There are no any way to set something like -B BOOTIF dynamically here. 

Fortunately, we can boot iPXE from pxelinux menu. DHCP configuration was updated to allow iPXE boot when ipxe.lkrn boot:
...
if exists user-class and option user-class = "iPXE" {
filename "menu.ipxe";
} else {
filename "pxelinux.0";
}

pxelinux.cfg/default:
...
label omni7
kernel ipxe.lkrn
menu.ipxe:
#!ipxe

kernel omni7/platform/i86pc/kernel/amd64/unix -B boot-mac=${netX/mac},install_media=http://192.168.1.1/omni7/kayak.zfs.xz,install_config=http://192.168.1.1/omni7
initrd omni7/miniroot
boot
iPXE allows to get mac address with ${netX/mac} variable, so "boot-mac" will contain mac-address which was used for booting via PXE.