Does ip belong to network? alp's notes

It's so easy to check if IP belong to network... Until you start doing this in shell. I've tried and finally got this. This version works in bash, dash and ksh... Good enough for me, but perhaps it could be optimized a bit to avoid cut usage. Our function gets two parameters - ip address and network in address/netmask format. In fact we compare IPaddress & netmask and IPnetwork & netmask.


belongs_network ()

netaddr=`echo $network | cut -d / -f 1`
netcdr=`echo $network | cut -d / -f 2`

a1=$(echo "$addr" | cut -d . -f 1)
a2=$(echo "$addr" | cut -d . -f 2)
a3=$(echo "$addr" | cut -d . -f 3)
a4=$(echo "$addr" | cut -d . -f 4)

n1=$(echo "$netaddr" | cut -d . -f 1)
n2=$(echo "$netaddr" | cut -d . -f 2)
n3=$(echo "$netaddr" | cut -d . -f 3)
n4=$(echo "$netaddr" | cut -d . -f 4)


if [ $ares -eq $nres ] ; then
return 0
return 1

if belongs_network; then
echo "belongs"
echo "does not belong"

Creative xor Use Josef "Jeff" Sipek

Last month at work I got to try to optimize a function that takes a number and rounds it up to the next power of 2. The previous implementation used a simple loop. I didn’t dive into obscure bit twiddling, but rather used a helper function that is already in the codebase. Yes, I let the compiler do the heavy lifting of turning easy to understand code into good machine code. The x86 binary that gcc 6.3 produced has an interesting idiom, and that’s why I’m writing this entry.

The new code:

static inline unsigned int bits_required32(uint32_t num)
        return num == 0 ? 0 : 32 - __builtin_clz(num);

/* Returns x, such that x is the smallest power of 2 >= num. */
uint32_t nearest_power(uint32_t num)
	if (num == 0)
		return 1;

        return 1U << bits_required32(num - 1);

This is a slightly simplified version of the code, but it demonstrates the optimization quite well.

The nearest_power function disassembles as:

    nearest_power:      8b 54 24 04        movl   0x4(%esp),%edx
    nearest_power+0x4:  b8 01 00 00 00     movl   $0x1,%eax
    nearest_power+0x9:  85 d2              testl  %edx,%edx
    nearest_power+0xb:  74 14              je     +0x14	<nearest_power+0x21>
    nearest_power+0xd:  83 ea 01           subl   $0x1,%edx
    nearest_power+0x10: 74 0f              je     +0xf	<nearest_power+0x21>
    nearest_power+0x12: 0f bd d2           bsrl   %edx,%edx
    nearest_power+0x15: b9 20 00 00 00     movl   $0x20,%ecx
    nearest_power+0x1a: 83 f2 1f           xorl   $0x1f,%edx
    nearest_power+0x1d: 29 d1              subl   %edx,%ecx
    nearest_power+0x1f: d3 e0              shll   %cl,%eax
    nearest_power+0x21: c3                 ret    

The first 6 instructions contain the prologue and deal with num being zero or one—both cases produce the result 1. The remaining 6 instructions make up the epilogue and are where the calculation happens. I’m going to ignore the first half of the function, since the second half is where the interesting things happen.

First, we get the number of leading zeros in num - 1 and stash the value 32 in a register:

    nearest_power+0x12: 0f bd d2           bsrl   %edx,%edx
    nearest_power+0x15: b9 20 00 00 00     movl   $0x20,%ecx

The number of leading zeros (%edx) is in the range 0–31.

Here is the really interesting bit:

    nearest_power+0x1a: 83 f2 1f           xorl   $0x1f,%edx

This xors the number of leading zeros (i.e., 0–31) with 31. To decipher what this does, I find it easier to consider the top 27 bits and the bottom 5 bits separately.

operand binary
0x1f 00000000 00000000 00000000 000 11111
edx 00000000 00000000 00000000 000 xxxxx

The xor of the top bits produces 0 since both the constant 31 and the register containing any of the numbers 0–31 have zeros there.

The xor of the bottom bits negates them since the constant has ones there.

When combined, the xor has the same effect as this C expression:

out = (~in) & 0x1f;

This seems very weird and useless, but it is far from it. It turns out that for inputs 0–31 the above expression is the same as:

out = 31 - in;

I think it is really cool that gcc produced this xor instead of a less optimal multi-instruction version.

The remainder of the disassembly just subtracts and shifts to produce the return value.

Why xor?

I think the reason gcc (and clang for that matter) produce this sort of xor instruction instead of a subtraction is very simple: on x86 the sub instruction’s left hand side and the destination must be the same register. That is, on x86 the sub instruction works as:

x -= y;

Since the destination must be a register, it isn’t possible to express out = 31 - in using just one sub.

Anyway, that’s it for today. I hope you enjoyed this as much as I did.

The commoditization of IT? The Trouble with Tribbles...

IT, so the story goes, is now a boring commodity. But is this true?

Let's first define what a commodity is. There are a range of definitions we could use, but I'm going to think of a commodity as something that is functionally undifferentiated and available from multiple sources. The key aspect here is that of interchangeability (aka fungibility).

As an example, most computer components fall into the commodity category. Memory DIMMS, disk drives, network interfaces - you can (in principle) use any vendor's disk drives or memory and your computer will still work. You can use a mouse, keyboard, or monitor from any vendor and things will work just fine. Vendors have to differentiate in other ways - performance, cost, reliability, service.

What about smartphones? I would say that the phone piece is a commodity. Whether for a mobile or a land line, you can switch your telephone for another make or model, and you can switch from one telephony provider to another.

But the smart part of smartphones isn't properly interchangeable. You can't simply swap an Apple handset for an Android and carry on as you were; you have to switch everything to a different domain. And the suppliers here are keen to enforce differentiation and prevent interchangeability. We live in a world of proprietary walled gardens.

In most non-trivial cases, databases aren't commodities. Big database companies rely on the fact that you couldn't migrate to another database vendor even if you wanted to.

Operating systems are clearly differentiated. You can't swap Solaris for Windows, or either for Linux or BSD. You can't even treat different distributions as commodities if you restrict yourself to the Linux domain.

Although the operating system landscape is changing a little, in that Docker and containerization offer the prospect of interchangeability - you could, in theory, run a Docker image anywhere and on anything.

Cloud computing definitely isn't a commodity. (Thinking of it as a utility might be slightly more accurate.) Heck, there are sufficient differences over what's available that migrating between different AWS regions isn't smooth, let alone migrating between cloud providers.

Vendor lock-in is the big thing, and it's diametrically opposed to being a commodity - what vendor wants to make it easy for its customers to leave? (Despite that being one of the key attractions of any vendor in practice.)

One of the requirements for interchangeability is standardization, and there's a tension here between standardizing things (thereby making things the same) and innovation, which necessarily implies change. I could (and probably will at some point) go on at length about innovation, but I see precious little innovation in practice, more constant reinvention of the square wheel. Meanwhile the standards we have are either efforts like POSIX, which is largely codifying accidental implementations from the 1970s, or ad-hoc emergence of initial implementations that were cobbled together with little or no thought for actual suitability.

Rather than commoditization being a standard base, with a rising tide lifting all boats, any commoditization chips away the good stuff to leave the lowest common denominator, while everyone deliberately introduces incompatibilities in the name of differentiation.

So it seems to me that, far from being commoditized, IT has been monopolized and mediocritized.

Tribblix - illumos for everyone? The Trouble with Tribbles...

When I was doing a bit of a branding exercise for Tribblix, part of which generated the rather amateurish logo I now have - something I needed to make some business cards and stickers to take to FOSDEM this year - one of the things I wondered about was a good tagline.

In the end, I ended up with "the retro illumos distribution". Of course, it was pointed out that illumos was retro enough on its own, so the idea of a retro variant was a bit unnecessary.

The other tagline I came up with was "illumos for everyone". I rejected it in the end because it was a bit preposterous - I'm not really building something for everyone.

Yet the underlying idea here was simple - that I would actively seek to build a distribution that was inclusive, not exclusive. That's why:

  • I have a SPARC version as well as x86
  • On x86, I support 32-bit as well as 64-bit systems
  • Tribblix is suitable for both desktop and server use
  • Tribblix is a flexible system, not an appliance or hypervisor
  • I work on ensuring Tribblix will work successfully on systems with more minimal resources than other distributions
  • A variety of installation methods are supported - media, network, iPXE
  • I've worked on installation in the cloud, both KVM-based and AWS, in addition to bare metal or other hypervisors
  • I've tried to make key features such as zones easier to configure and use

Generally, the idea is to reduce the barriers and limitations for installing and using Tribblix.

This summer, I came across a much better way of putting it. Rather than "illumos for everyone", a variation of the UK Labour Party's slogan expresses the idea much more elegantly. Tribblix would be "illumos for the many, not the few". It's a shame that the slogan is already taken, as it expresses the philosophical aim rather neatly.

Solaris to Linux Migration 2017 Brendan Gregg's Blog

Many people have contacted me recently about switching from Solaris (or illumos) to Linux, especially since most of the Solaris kernel team were let go this year (including my former colleagues, I'm sorry to hear). This includes many great engineers who I'm sure will excel in whatever they choose to work on next. They have been asking me about Linux because I've worked for years on each platform: Solaris, illumos, and Linux, in all cases full time and as a subject matter expert. I've also done some work on BSD, which is another compelling choice, but I'll discuss that another time. The following is my opinion and not an official guide to any OS. Switching from Solaris to Linux has become much easier in the last two years, with Linux developments in ZFS, Zones, and DTrace. I've been contributing (out of necessity), including porting my DTraceToolkit tools to Linux, which also work on BSD. What follows are topics that may be of interest to anyone looking to migrate their systems and skillset: scan these to find topics that interest you. ## ZFS ZFS is available for Linux via the [zfsonlinux] and [OpenZFS] projects, and more recently was included in Canonical's Ubuntu Linux distribution: Ubuntu Xenial 16.04 LTS (April 2016). It uses a Solaris Porting Layer (SPL) to provide a Solaris-kernel interface on Linux, so that unmodified ZFS code can execute. My company uses ZFS on Linux in production, and I've been the go-to person for deep ZFS problems. It feels largely the same, except kstats are in /proc/spl/kstat/zfs/arcstats, and I debug it with Linux tracing tools instead of DTrace (more on that next). There have been some issues on Linux, but overall it's been ok, especially given how hard we push ZFS. We've used it for our container hosts (codename Titus) that do frequent snapshots, use send/recv, etc. I think the ARC memory counters need more work, as people keep capping the ARC to avoid keeping memory from applications, and the ARC should already handle that (with the exception of massive allocations). There's also a ZFS send/recv code path that should try to use the TASK\_INTERRUPTIBLE flag (as suggested by a coworker), to avoid a kernel hang (can't kill -9 the process). Both of those should be easy fixes. There are plenty of other bugs to fix, though, which you can see in the issue list on github. Linux has also been developing its own ZFS-like filesystem, btrfs. Since it's been developed in the open (unlike early ZFS), people tried earlier ("IS EXPERIMENTAL") versions that had serious issues, which gave it something of a bad reputation. It's much better nowadays, and has been integrated in the Linux kernel tree (fs/btrfs), where it is maintained and improved along with the kernel code. Since ZFS is an add-on developed out-of-tree, it will always be harder to get the same level of attention. We're now testing container hosts in production on btrfs, instead of ZFS. Facebook have been using btrfs for a while in production, and key btrfs developers now work at Facebook and continue its development. There is a [btrfs status page], but for the latest in development see btrfs posts to the [linux kernel mailing list] and btrfs sections on [kernelnewbies]. ## Observability Here's the big picture of performance observability tools on Linux, from my [Linux performance] page, where I also have diagrams for other tool types, as well as videos and slides of prior Linux performance talks:

I also have a USE Method: Linux Performance Checklist, as a different way to navigate and apply the tools. Linux has many more text interfaces in /proc that Solaris does, which help for ad hoc debugging. It sounds inefficient, but I've never seen /proc readers show up in CPU flame graphs. ## DTrace Linux 4.9 provides the raw capabilities to implement DTrace-like scripts, allowing me to port over many of my DTraceToolkit scripts (they also work on BSD). The hardest part on Linux is now done: kernel support. I wrote about it in a previous post, [DTrace for Linux 2016]. You might also like my [Give me 15 minutes and I'll change your view of Linux tracing] video as an introduction to the different built-in Linux tracers. Nowadays, there are three built-in tracers that you should know about: - **ftrace**: since 2008, this serves many tracing needs, and has been enhanced recently with hist triggers for custom histograms. It's fast, but limited in places, and usually only suited as a single-user tool (there are workarounds). I wrote an ftrace toolkit, [perf-tools], and the article [Ftrace: the hidden light switch]. - **perf**: since 2009, this started as a PMC profiler but can do tracing now as well, usually in a dump-and-post-process style. It's the official profiler. I wrote a page on it: [perf]. - **eBPF**: tracing features completed in 2016, this provides efficient programmatic tracing to existing kernel frameworks. Many new tools can now be written, and the main toolkit we're working on is [bcc]. Here's some output from my zfsdist tool, in bcc/BPF, which measures ZFS latency as a histogram on Linux:
# zfsdist
Tracing ZFS operation latency... Hit Ctrl-C to end.

operation = 'read'
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 4479     |****************************************|
         8 -> 15         : 1028     |*********                               |
        16 -> 31         : 14       |                                        |
        32 -> 63         : 1        |                                        |
Linux has been adding tracing technologies over the years: kprobes (kernel dynamic tracing), uprobes (user-level dynamic tracing), tracepoints (static tracing), and perf\_events (profiling and hardware counters). The final piece was enhanced BPF (aka eBPF: enhanced Berkeley Packet Filter), which provided the custom in-kernel programmability needed for an advanced tracer, created by Alexei Starovoitov (now at Facebook). There's a front-end for BPF, [bcc] \(BPF Compiler Collection\), which has many single- and multi-purpose tools written by me and others. Check it out. It's currently much more difficult to write a bcc/BPF script than a DTrace script, but at least it's now possible (using Linux built-ins), and one day there might be an easier front-end. I have a page on eBPF tracing, and the current bcc/BPF tools are:
There have been other tracing projects for Linux, and some companies found them useful for their needs, but the big problem was that they weren't merged in mainline Linux. Now that eBPF has been, many of these tracing projects may switch to using it as a backend since it is stable, or, they could further specialize in what they do (non-BPF related), eg, offline analysis of a capture file (LTTng, sysdig). If you're on an older Linux kernel (3.x), you can use ftrace for some tracing needs. My [perf-tools] includes single purpose tools like opensnoop, execsnoop, iosnoop, and more, and multi-purpose tools like funccount, kprobe, and uprobe. I intended perf-tools as a hacky workaround until eBPF was available, but ftrace has since been developed further (hist triggers) so perf-tools may have a reason to continue. ## Zones I'd recommend this post about [Zones vs Containers] by Jessie Frazelle. On Linux, containers are a combination of namespaces (restriting what a process sees) and cgroups (similar to Solaris resource controls). People rarely create them manually. They use third-party software like Docker or Kubernetes to simplify their administration. I gave a talk about container performance recently at DockerCon, and included a quick summary of how they work: YouTube, SlideShare:
If you search of slideshare and youtube, you'll find many other good talks on containers as well. Apart from Jessie, I also like talks by Jérôme Petazzoni, and Tejun Heo. Linux containers have been in rapid development in the last few years. It's the container team at my employer that runs the newest Linux kernels, since they need the latest features and fixes, and you should try to run the newest as well. Currently that means at least Linux 4.9. There's a lot about Linux containers that isn't well documented yet, especially since it's a moving target. (Zones lacked docs when they came out too, which is why I wrote the first Zones resource control docs.) Search for recent blog posts on Linux containers, and try them out, and you'll piece together their capabilities and workings bit by bit. Here are some documents for understanding internals: - Linux Namespaces from Wikipedia - Linux Cgroups from Wikipedia - Documentation/cgroup-v1 from the Linux source - Documentation/cgroup-v2.txt from the Linux source One feature Linux containers lack is a container ID in the kernel. It's been proposed on lkml, but the patches have not been integrated yet (it was last discussed two weeks ago). Some people argue that the kernel shouldn't have one, since a container is a collection of namespaces and cgroups defined in user-space (by Docker, etc), and it's therefore up to user-space to track it. As a performance engineer who does kernel tracing, I find the lack of an ID I can trace in the kernel to be pretty annoying. There are workarounds: I can use the perf\_events cgroup ID, provided the container software is configuring it (they do). Some specific differences that got my attention: you can access a container's mount namespace from the host (global zone) via /proc/PID/root, given a PID in a container. But understanding if a PID belongs to a container is surprisingly difficult: there's no -Z option to tools like ps, since there's no container ID in the kernel. From the host (global zone), given PID 18300:
host# grep NSpid /proc/18300/status
NSpid:	18300	1
host# grep 18300 /sys/fs/cgroup/perf_event/*/*/tasks
The first command shows that PID 18300 is really PID 1 in another process namespace: a telltale sign it's in a container. I also checked a task list from /sys/fs/cgroup, and saw it's in a docker cgroup. I've suggested adding a command to docker to make listing at least the top-level PIDs in containers easier. ## Virtual Machines The two main technologies on Linux are Xen and KVM (and there's Bhyve for BSD). Xen is a type 1 hypervisor that runs on bare metal, and KVM is type 2 that runs as processes in a host OS. Oracle VM Server is based on Xen. Xen's biggest user is the Amazon EC2 cloud, which has over one million customers, and appears to be a custom version (it self identifies as version ""). Outside of EC2, many other providers are deploying on KVM. Both Xen and KVM have had many performance and security improvements, and workloads can now be tuned to run at almost bare metal speeds (say, a 3% loss or less). At my employer we sometimes use SR-IOV for direct network interface access, and NVMe for direct disk access. Some years ago, it was easy to make the case to switch from VMs to containers due to the performance improvements alone, as VMs had to emulate everything. Not so today, although this comes at the cost of complexity and required tunables. In general, I find Xen more complicated to work with than KVM. (FWIW, I contributed some patches to Xen to allow a subset of PMCs to be accessed.) If you switch from managing Oracle VM to Xen, it will hopefully feel very similar. If you switch to KVM, it will be quite different, but hopefully easier. ## SMF I personally introduced hundreds of customers to SMF while teaching Solaris 10 classes. I came up with a great demo where I could break Solaris 9 and 10 servers in the same way, then demonstrate how it would take an hour and a reboot cycle to fix Solaris 9, but minutes and no reboot to fix Solaris 10. I also wrote and published an entertaining SMF manifest that played music. A lot of people got it and learned to love SMF. But some still hated it and the new universe of stuff one had to learn. Some vowed to remain on Solaris 9 forever in protest, or to switch to Linux. Linux is now going through this with systemd, and has its own share of systemd critics, encouraging distros to remove systemd. I suspect it will prevail, just as SMF did. There are many implementation differences, but the same general idea: a coordinated system to manage parallel application startup and dependency state. If you absolutely can't stand systemd or SMF, there is BSD, which doesn't use them. You should probably talk to someone who knows systemd very well first, because they can explain in detail why you should like it. ## Performance Linux should be faster out of the box for many production workloads, due to improvements in scheduling (including a tickless kernel), driver support, newer syscalls features, newer TCP feature support, processor optimizations (often provided by Intel engineers directly), a lazy TLB, and more. There's also better compiler and application support: in some cases applications run faster on Linux, not because the kernel is faster, but because that compilation target has had more attention. I've even seen cases where the Makefile compiles on Linux with -O3, and Solaris with -O0, thus crippling Solaris performance, for no legitimate reason. How much Linux is faster depends on the workload: I'd expect between zero and a few percent typically. There are some extreme cases, where a lack of proper driver support on Solaris can have Linux run 10x faster. I'd also expect you could still find a workload where Linux is slower. For example, although it's very minor, the /dev/\*random devices were faster on Solaris last time I checked, as Linux was spending more effort on entropy for improving security. (Or, from a different point of view, Solaris was less secure.) I spoke about the performance differences in my 2014 SCALE keynote "What Linux can learn from Solaris performance and vice-versa" ([slides]) where the conclusion was that Linux may run faster out of the box, but I could typically make Solaris run much faster thanks to optimizations found using DTrace. DTrace didn't exist for Linux at the time, but now we have BPF (see previous section). There have been many other improvements to Linux since then, as well. ## Security Key Linux security technologies to learn: - AppArmor: application access control - seccomp: secure computing mode, restricts system call usage - SELinux: Security-Enhanced Linux, for access control and security policies (alternate to apparmor) - Linux audit: event logging - eBPF (which is used to enhance seccomp) - iptables: network firewalling - LSM: Linux Security Modules There are many more: browse the release notes on [kernelnewbies]. Live kernel patching is another capability, that is currently being integrated in the 4.x series. And namespaces, used for Linux containers, are also a relevant technology. There have been security vulnerabilities, just like there are with any software. This is especially true for Linux, which is used everywhere and has a lot of attention. The way the cloud is used helps with security: most instances at my employer have only been up for one or two days. We're constantly creating and destroying instances from a base image, which means that when we update that base image with security patches, they get rolled out very quickly. ## Reliability Our production servers, running Ubuntu, have been rock solid. In over three years, I've only seen three kernel panics, for an enormous deployed fleet (tens of thousands of Linux instances). Working on Solaris, I'd usually see several different panics per year. I would not attribute this to, say, a more limited range of workloads at my company: we have a wide range of different things running internally. I would, however, attribute some of it to our virtualized environment, running virtual machines: the hypervisor will handle some hardware problems before the guest kernel sees them, which I suspect helps us avoid some hardware-related panics. In a test environment, I've seen several more Linux panics in the past three years. Five of those were my own kernel bugs, when I was doing kernel development. Two others were on the latest "release candidate" (-rc) kernel from [] – the bleeding edge of kernel development. If you do run the latest -rc kernel and hit a bug, please share on the Linux kernel developers mailing list ([lkml]) where it should be quickly fixed. In production, people often stick to the Long Term Support (LTS) kernel releases. Vendors (see later section) are usually quick to make sure these LTS kernel releases have all the right patches and are reliable. My rock solid experience with Ubuntu is on an LTS release. ## Crash Dump Analysis It can be done. One technique is kdump, which uses a capture kernel configured in grub. Execution switches to the capture kernel during a panic, so that a working kernel can capture the state of the system. I've set this up a few times and successfully debugged kernel panics. It's worth describing what commonly happens with Linux kernel panics. In an environment like ours (patched LTS kernels running in VMs), panics are rare. The odd time we hit them, we'll take the "oops message" – a dump of the kernel stack trace and other details from the system log – and search the Internet. We almost always find that someone else has hit it and had it fixed, and so then we track the patch to the kernel update and deploy that. There's so many people running Linux, and given that we're usually on LTS and not the release candidates, it's rare that we're the first to hit a panic. For that rare case where we are first to hit a panic: by posting the entire oops message to the right mailing list, the responsible engineer will usually fix it quick (by figuring out how to reproduce from the oops message alone), and then we track their patch into a kernel update. That mailing list would be lkml if we're running the latest rc (only in test), or the mailing list identified in the MAINTAINERS file (more on that later). In Solaris, we'd only really do panic analysis given a crash dump, but Linux gets lots of mileage from the oops message alone. Just from a quick search, see this presentation PDF, which digs into oops message components. Another difference: kernel panics don't always reboot the system. Linux can oops and kill a process, but not reboot if it doesn't think it needs to, instead leaving it up so you can login and debug. It's also why you should always run "dmesg" at the start of any investigation, to check if the system (that's still up!) has in fact oops'd. As for hitting a panic for the first time, posting an oops message, but finding no one wants to fix it: I haven't seen that yet in 3 years. The day it does happen, I'll set up the capture kernel, get a crash dump, and do the analysis myself. You can also ask your Linux OS vendor, if you are paying one for support. ## Debugging Tools Instead of mdb you'll be using gdb, which has been improving, and even has a TUI mode nowadays. I wrote gdb Debugging Full Example (Tutorial), and I also recommend you watch Greg Law's talk Give me 15 minutes and I'll change your view of GDB. I've noticed that newer projects are using lldb. Here's an lldb to gdb command map. ## Other Tools If you never found this before, it's been a great resource over the years: the Rosetta stone of Unix, from which you can draw a table of just Linux and Solaris.
There's also a new effort to recreate it. Oracle have a similar useful page as well: the Linux to Oracle Solaris 11 comparison, as well as a [procedure] for migrating from Solaris to Linux. A few other tool differences that stood out to me: - syscall tracing: truss > strace - packet sniffing: snoop > tcpdump - process tree: ptree > pstree -ps - kernel tuning: ndd > sysctl - binary dumping: elfdump > objdump - kernel module list: modinfo > lsmod - swap status (swap often isn't used): swap > swapon ## Other Kernel Differences Linux supports overcommit: instead of guaranteeing that all virtual memory can be stored when needed, including on swap devices, Linux bets that it won't need to, so allows more virtual memory allocations than it could possibly store. This means that malloc() almost always returns successfully, so much so that some programmers on Linux don't bother checking its return value. What happens if processes really do try to populate all that virtual memory? The system runs out, and the kernel's out-of-memory killer (OOM killer) will pick a sacrificial process and kill it. If that seems wildly unacceptable, note that you can tune overcommit on Linux to not do this, and behave more like Solaris (see sysctl vm.overcommit\_memory). I covered various kernel differences in my SCALE 2014 talk What Linux can learn from Solaris performance and vice-versa, and of course my book Systems Performance: Enterprise and the Cloud where I cover both Linux and Solaris. ## OS Vendors and Paying for Linux If you're already an Oracle customer and switch to Linux, then there is Oracle Linux. Other vendors who offer support include Red Hat, Canonical, and SUSE. However, most companies don't pay for Linux. How does it get developed? Often companies want features and will develop and upstream them to meet their own needs. But once it's part of mainline Linux, their contribution may end there. There may be no real documentation written, no marketing of the feature, and no education of the community. Just code that appears in the Linux source because IBM/Cisco/Hitachi/whoever needed it there for their own internal project. This lack of supporting efforts can make learning Linux capabilities more challenging. ## Linux Kernel Engineering If you want to get into Linux kernel development, you'll need to get familiar with Coding Style, Submitting Patches, and the Submit Checklist. You could also read On submitting kernel patches (see section 14.1 and imagine how different Solaris would be if Linux accepted that patch!). There are also many blog posts on how to compile the Linux kernel and submit your first patch, just search for "compiling the Linux kernel". It can be menu driven or automated. Just as an example, here's my build script for automating Linux kernel builds for my custom EC2 environment (it's custom, you don't want to use it, just giving you an idea). You'll especially want to understand the MAINTAINERS file. It's very unlikely you'll be submitting patches to Linus Torvalds (nor the github repo, read why). You'll almost always be sending your patches to "maintainers", who will do code review and then pass your patch on to Linus. There are over one thousand subsystems in Linux (many for device drivers), each has one or more maintainers. Maintainers make the day-to-day decisions in Linux development. Apart from reading the MAINTAINERS file (which includes a legend at the top), you can query it. Eg, to see who maintains tcp\_output.c:
linux$ ./scripts/ -f net/ipv4/tcp_output.c
"David S. Miller"  (maintainer:NETWORKING [IPv4/IPv6])
Alexey Kuznetsov  (maintainer:NETWORKING [IPv4/IPv6])
James Morris  (maintainer:NETWORKING [IPv4/IPv6])
Hideaki YOSHIFUJI  (maintainer:NETWORKING [IPv4/IPv6])
Patrick McHardy  (maintainer:NETWORKING [IPv4/IPv6]) (open list:NETWORKING [IPv4/IPv6]) (open list)
The MAINTAINERS file also shows the mailing lists for each subsystem. Patches often get hashed out there and polished long before they are sent by the maintainer to Linus on lkml. The kernel development cycle: It begins with a new release (eg, 4.13), and then every Sunday (or whenever Linus decides) a release candidate in posted. So there'll be 4.13-rc1, then 4.13-rc2, etc, usually up to -rc7 or -rc8, which will be the final release candidates, and then Linus will cut the next release (eg, 4.14). All major changes are supposed to go in the first or second release candidate, and then minor bug fixes by rc7. For example, Linus just released 4.13, saying:
So last week was actually somewhat eventful, but not enough to push me
to delay 4.13.

Most of the changes since rc7 are actually networking fixes, the bulk
of them to various drivers. With apologies to the authors of said
patches, they don't look all that interesting (which is definitely
exactly what you want just before a release).  Details in the appended
If you make some major changes or feature additions, and Linux is currently on rc3 or later, your patches are unlikely to be integrated in that release. The maintainers will hold on to them: they often have their own forks of Linux for this purpose. As for brilliant jerks: Linux has them. So did Solaris. You know what I mean: the difference between saying "this code is idiotic" (probably ok) and "you are an idiot" (probably not ok). I don't believe in such behavior, and I think it's even more problematic for Linux given so many volunteers who could choose to do something else if pushed the wrong way. Fortunately, my own experience with Linux has been largely positive. To get started on Linux kernel development, I'd subscribe to lkml and other lists, then offer to code review and test patches that you see posted. A lot of people are writing code, but fewer offering to help code review and test (and write docs). This should be an easy way to get started, build some credibility, and make valuable contributions. Sometimes good patches are posted and slip through the cracks, so replying with "I tested it, it works, thanks!" can help get things integrated, and the engineers will be grateful for your help. ## Community & Experts The Linux community is massive. Here are areas you should know about: - [kernelnewbies]: Posts changelogs for each Linux release (eventually), highlighting major and minor additions. - [lkml]: The Linux Kernel Mailing List. This is the final staging ground for patches to be integrated into Linux, so following this will let you see what's happening right now. Be warned that it's high volume, and there are only a few reasons you should ever post there: 1. you are submitting a patch set and the MAINTAINERS file told you to CC lkml; 2. you are providing constructive expert comments on someone else's patch set, ideally after you tested it; or 3. you're running the latest -rc from (or github/torvalds/linux) and hit a bug/panic. - []: The best news feed of what's happening in Linux. It requires a subscription to read the latest articles, but if Linux is going to be a big part of your job, it's worth it. Many of the experts in the Linux community are maintainers, as listed in the MAINTAINERS file. It's rare to bump into a maintainer: those I know keep their heads down working on lkml and the sublists, and usually avoid blog posts, meetups, and conferences (with some exceptions, like Linux Plumbers: an exclusive event for kernel engineers only, NetDev, and Kernel Recipes. Which reminds me: I'm helping run the tracing micro conference at Plumbers this year, and I'm also speaking at Kernel Recipes, so if you manage to make it to either, I'll see you there.)
There was a phenomenon in Solaris where we, the engineers, began doing our own marketing and evangelism, out of desperation to save Solaris. I've never found that happening in Linux, where there's the belief that Linux is too big to fail. I think that is a weakness of Linux. When I first joined Sun in 2001, it was believed that Sun was too big to fail, as well. Nowadays, Sun is a cobweb-covered sign at the Facebook Menlo Park campus, kept as a warning to the next generation. ## Documentation The best documentation is under /Documentation in the kernel source (online at [] or [github/torvalds]). That documentation is correct but terse, written by the engineers as they commit code. Full documentation for features, such as would be published by Sun, is often non-existent. Even the major releases can go undocumented for weeks until someone writes a summary on [kernelnewbies], which is about as close to official release notes as you can get. I think the problem is a lack of paid tech writers working on Linux. Who would pay them? This lack of documentation makes learning and discovering new Linux features difficult. ## Other Differences - **Packaging**: on Ubuntu (and similar) use "apt", on Red Hat (and similar) use "yum". They make it very easy to install packages, and automatically handle dependencies. - **Driver & Platform Support**: Linux runs on practically everything. - **Server Application Support**: Linux is usually the development environment for server applications, where things are most likely to work. - **Desktop Support**: I miss CDE and dtksh. There's a lot of options on Linux, but I'm out of touch with current desktop environments, so can't recommend any. ## Job Market The Linux job market Linux has been much healthier for a while and growing. Solaris vs Linux jobs in the UK:


But there's another factor at play: jobs are also migrating from both Solaris and Linux to cloud jobs instead, specifically AWS. From another tracker, for the US:


The market for OS and kernel development roles is actually shrinking a little. The OS is becoming a forgotten cog in a much larger cloud-based system. The UK tracker plots the growth in AWS jobs clearly:
The job growth is in distributed systems, cloud SRE, data science, cloud network engineering, traffic and chaos engineering, container scheduling, and other new roles. While you might be considering switching to an equivalent Linux or BSD job, you should also consider a new role if that interests you. Leapfrogging to the next big thing was one of Deirdré Straughan's suggestions in Letting Go of a Beloved Technology. I suspect at some point there'll be more jobs supporting the AWS cloud than there will be supporting Linux. If you choose this route, AWS makes it very easy to create servers (you just need a credit card) and learn how to use them. Which is also why it's been so successful: developers can create servers when they want them, without having to ask and wait for the system administration team. As for companies: I can recommend Netflix, which has a culture that works really well. If you stay working on the OS and kernel, there are still many jobs in support and development, and always will be. Large companies (like the one I work for) have OS teams to look after patching, releases, and performance. Appliance manufacturers hire kernel engineers to develop custom features, including storage appliances. There are several ZFS-based startups, who would appreciate your experience on ZFS. ## Good Luck This is the post I wish someone had written for me when I made the switch. The first few months were the hardest. It gets easier. It will also become easier if you contribute to Linux, or BSD, and fix the annoying things you discover. Solaris may not survive, but certain technologies and expertise will. Here's the Sun I'd like to remember: lots of smart people, having fun, and doing great work (music might not play outside the US):
RIP, Sun. Good luck to all, and let the spirit of great engineering live on in the next projects you choose. ## References ZFS - - - - - - Observability - - - DTrace - - - - - - - - Zones - - - - - - - Performance - Security - - - - - - - - Crash Dump & Debugging - - - - Other Tools & Kernel Differences - - - - - - Linux Kernel Engineering - - - - - - - - Community & Experts - - - Documentation - - Job Market - - - - - - Thanks to Deirdré Straughan for edits. Now to write a BSD version of this post... []: [procedure]: [bcc]: [bcc tools]: [example output]: [slides]: [perf-tools]: [zfsonlinux]: [btrfs status page]: [linux kernel mailing list]: [kernelnewbies]: [OpenZFS]: [DTrace for Linux 2016]: /blog/2016-10-27/dtrace-for-linux-2016.html []: [lkml]: [Zones vs Containers]: []: [github/torvalds]: []: [Give me 15 minutes and I'll change your view of Linux tracing]: [perf]: [Ftrace: the hidden light switch]: [Linux performance]: [github/torvalds]:

The sudden death and eternal life of Solaris The Observation Deck

As had been rumored for a while, Oracle effectively killed Solaris on Friday. When I first saw this, I had assumed that this was merely a deep cut, but in talking to Solaris engineers still at Oracle, it is clearly much more than that. It is a cut so deep as to be fatal: the core Solaris engineering organization lost on the order of 90% of its people, including essentially all management.

Of note, among the engineers I have spoken with, I heard two things repeatedly: “this is the end” and (from those who managed to survive Friday) “I wish I had been laid off.” Gone is any of the optimism (however tepid) that I have heard over the years — and embarrassed apologies for Oracle’s behavior have been replaced with dismay about the clumsiness, ineptitude and callousness with which this final cut was handled. In particular, that employees who had given their careers to the company were told of their termination via a pre-recorded call — “robo-RIF’d” in the words of one employee — is both despicable and cowardly. To their credit, the engineers affected saw themselves as Sun to the end: they stayed to solve hard, interesting problems and out of allegiance to one another — not out of any loyalty to the broader Oracle. Oracle didn’t deserve them and now it doesn’t have them — they have been liberated, if in a depraved act of corporate violence.

Assuming that this is indeed the end of Solaris (and it certainly looks that way), it offers a time for reflection. Certainly, the demise of Solaris is at one level not surprising, but on the other hand, its very suddenness highlights the degree to which proprietary software can suffer by the vicissitudes of corporate capriciousness. Vulnerable to executive whims, shareholder demands, and a fickle public, organizations can simply change direction by fiat. And because — in the words of the late, great Roger Faulkner — “it is easier to destroy than to create,” these changes in direction can have lasting effect when they mean stopping (or even suspending!) work on a project. Indeed, any engineer in any domain with sufficient longevity will have one (or many!) stories of exciting projects being cancelled by foolhardy and myopic management. For software, though, these cancellations can be particularly gutting because (in the proprietary world, anyway) so many of the details of software are carefully hidden from the users of the product — and much of the innovation of a cancelled software project will likely die with the project, living only in the oral tradition of the engineers who knew it. Worse, in the long run — to paraphrase Keynes — proprietary software projects are all dead. However ubiquitous at their height, this lonely fate awaits all proprietary software.

There is, of course, another way — and befitting its idiosyncratic life and death, Solaris shows us this path too: software can be open source. In stark contrast to proprietary software, open source does not — cannot, even — die. Yes, it can be disused or rusty or fusty, but as long as anyone is interested in it at all, it lives and breathes. Even should the interest wane to nothing, open source software survives still: its life as machine may be suspended, but it becomes as literature, waiting to be discovered by a future generation. That is, while proprietary software can die in an instant, open source software perpetually endures by its nature — and thrives by the strength of its communities. Just as the existence of proprietary software can be surprisingly brittle, open source communities can be crazily robust: they can survive neglect, derision, dissent — even sabotage.

In this regard, I speak from experience: from when Solaris was open sourced in 2005, the OpenSolaris community survived all of these things. By the time Oracle bought Sun five years later in 2010, the community had decided that it needed true independence — illumos was born. And, it turns out, illumos was born at exactly the right moment: shortly after illumos was announced, Oracle — in what remains to me a singularly loathsome and cowardly act — silently re-proprietarized Solaris on August 13, 2010. We in illumos were indisputably on our own, and while many outsiders gave us no chance of survival, we ourselves had reason for confidence: after all, open source communities are robust because they are often united not only by circumstance, but by values, and in our case, we as a community never lost our belief in ZFS, Zones, DTrace and myriad other technologies like MDB, FMA and Crossbow.

Indeed, since 2010, illumos has thrived; illumos is not only the repository of record for technologies that have become cross-platform like OpenZFS, but we have also advanced our core technologies considerably, while still maintaining highest standards of quality. Learning some of the mistakes of OpenSolaris, we have a model that allows for downstream innovation, experimentation and differentiation. For example, Joyent’s SmartOS has always been focused on our need for a cloud hypervisor (causing us to develop big features like hardware virtualization and Linux binary compatibility), and it is now at the heart of a massive buildout for Samsung (who acquired Joyent a little over a year ago). For us at Joyent, the Solaris/illumos/SmartOS saga has been formative in that we have seen both the ill effects of proprietary software and the amazing resilience of open source software — and it very much informed our decision to open source our entire stack in 2014.

Judging merely by its tombstone, the life of Solaris can be viewed as tragic: born out of wedlock between Sun and AT&T and dying at the hands of a remorseless corporate sociopath a quarter century later. And even that may be overstating its longevity: Solaris may not have been truly born until it was made open source, and — certainly to me, anyway — it died the moment it was again made proprietary. But in that shorter life, Solaris achieved the singular: immortality for its revolutionary technologies. So while we can mourn the loss of the proprietary embodiment of Solaris (and we can certainly lament the coarse way in which its technologists were treated!), we can rejoice in the eternal life of its technologies — in illumos and beyond!

Total Solar Eclipse 2017 Brendan Gregg's Blog

While all eyes and instruments were aimed at the total eclipse of August 21st, I had a video camera pointed in the opposite direction. I was hoping to catch changes in the landscape – perhaps the Moon's shadow as it raced by. This was filmed from on a road north east of Madras, Oregon, almost right on the center line. It was just an experiment that I wasn't going to share, but when I saw it afterwards I thought it might interest some. Below you'll find time lapse and full videos, frames, and a montage. ## Videos Time lapse 60x (1 second equals 1 minute) (youtube):

Full video (youtube):
These videos start 6 minutes before the total eclipse. I had aimed the camera at the landscape, to look for the Moon's shadow, and not the sky which looked hazy and boring. But that changed during the eclipse. While the image does what I saw justice, the audio captured does not: it became eerily quite before the eclipse, adding to the anticipation. The video camera, however, auto-adjusted the audio gain, and a distant engine sounds louder and louder. In person it was very quiet. ## Frames before partial, -36 mins, -10 mins:
The partial eclipse lasted 73 minutes. The above frames show before the partial, 36 minutes to total (C2), and 10 minutes to total. During this time, not much looked different to the naked eye as it continually adjusted to the decreasing light levels (increasing pupil size). Most noticeable was the temperature of sunlight on skin, which changed from hot direct sunlight to cool direct sunlight. (I'd like to know: at what time during a partial eclipse is the sunlight equivalent to that on Mars?) The video camera was automatically adjusting to lower light levels as well, resulting in an image that resembles the naked eye view. It likely helped that I was using zoom to frame the shot, which means less light on the sensor than a wide-angle shot, causing the camera's image to darken sooner around the same time as the naked eye view. The image below shows 5 minutes to the total eclipse. Everything still _appears_ bright, but the Sun is a thin crescent. The next image, at the minus 1 minute mark, shows Mt Jefferson disappearing as it enters the total eclipse, and then things start changing fast. -5 minutes: -1 minute: -10 seconds: total eclipse +0 seconds: total eclipse +60 seconds: total eclipse +80 seconds: total eclipse +120 seconds: +10 seconds: By 1 minute after the eclipse, things were starting to appear normal again. The images resemble what I saw, much by chance: due to the attributes of the video camera and that I had zoomed in. In the middle of the eclipse, it was not midnight-dark, but rather a very late after-sunset dark. And it looked like there was a sunset in all directions. If I see another total eclipse I'll film the landscape again, and this time pay more attention to the video setup. My camera was bumped this time, and it ended up a bit crooked, plus there's a telegraph pole in the way. ## Montage 10 x 1 minutes: This shows the different light levels by taking 10 vertical strips from 1 minute intervals and arranging them side by side. To place Mt Jefferson in the eclipse with the video I have, I had to change the time order from right to left, so that the right-most frame is -5 minutes, and the left-most is +2 minutes. Here is a left to right version. ## Eclipse Prominences Immediately after totality I briefly observed the Sun through a pair of hand-held 20x binoculars (don't do this yourself; I timed it carefully so I was at no risk of seeing the end by accident). I observed three areas of solar prominences and loops in incredible clarity. Their color was wild: an intense violet that I've only seen once before, on some flowers in direct sunlight. And the clarity was stunning: I could see fine lines and details. I assumed that these details would be captured countless times by photographers, but browsing the images posted so far, including the NASA videos, I've seen no pictures that do them justice. Some images are blurry, but most over-expose these prominences (eg, the top image on Wikipedia's Solar Eclipse page), and they appear as saturated violet/white blobs that bleed over the Moon's disk. I suspect the human eye has a better dynamic range than cameras, and can handle seeing these and the corona at the same time, whereas photographers set light levels to capture the faint corona, which over-saturates the prominences.