AWS EC2 Virtualization 2017: Introducing Nitro Brendan Gregg's Blog

Hardware virtualization for cloud computing has come a long way, improving performance using technologies such as VT-x, SR-IOV, VT-d, NVMe, and APICv. At Netflix, we've been using these technologies as they've been made available for instance types in AWS EC2. The latest AWS hypervisor, Nitro, uses everything to provide a new hardware-assisted hypervisor that is easy to use and has near bare-metal performance. It's an exciting development in cloud computing: hardware virtualization is now fast. I've summarized hypervisor developments in EC2 with the above diagram. The columns show dimensions of instance performance, ordered in importance for typical workloads at Netflix (CPU-bound being the most important). The rows of this table are virtualization types, and the cells show the type of virtualization and are colored by the expected performance. You may first notice that, over time, green is creeping in from the left. That's deliberate engineering: optimizing the most important workloads first. Each dimension has progressed through these stages: 1. **Virtualized in Software**: While this can support an unmodified guest OS, many operations are emulated and slow. Apps may run 2x to 10x slower, or worse. 2. **Paravirtualization**: The hypervisor provides efficient hypercalls, and the guest OS uses drivers and kernel modifications to call these hypercalls. It's using software and coordination between the hypervisor and guest to improve performance. I'd expect measurable overhead of 10% to 50% (depending on the PV type and workload). 3. **Virtualized in Hardware**: Hardware support for virtualization, and near bare-metal speeds. I'd expect between 0.1% and 1.5% overhead. I'll summarize each row in the diagram, in chronological order: ## 1. Fully Emulated
--> Remember the original VMware x86 hypervisor from 1998? It's amazing to recall that it was even possible to virtualize x86 before processors had hardware-assisted virtualization (Intel VT-x and AMD-V), which were added in 2005 and 2006. This first x86 hypervisor used emulation and binary translation for privileged operations, such as syscalls and page table operations. This had a noticeable performance cost, especially for I/O-heavy workloads. First impressions endure, and many of us were introduced to hardware virtualization by something that was known for being "slow." But that was nearly two decades ago, and I don't think this hypervisor type ever ran on EC2. ## 2. Xen PV 3.0 (With Xen, there are many different possible configurations, including some I'm leaving out. I'm including this one as it helps tell the story of virtualization.) Enter paravirtualization, originally introduced in Xen, where the guest has been modified to be aware of the hypervisor and to make efficient hypercalls. In this configuration, the AMI and boot is paravirt (PV), the kernel is making hypercalls instead of privileged instructions, and the system is using paravirt network and storage drivers. This provides a performance improvement, but there are still significant overheads with privileged operations (syscalls and page table events), slowing I/O – although it is faster than before, so perhaps I should have colored that cell yellow instead of red. This is before processor hardware virtualization for CPUs and memory (Intel VT-x, AMD-V). The first instances on EC2 were configured like this, m1.small (thanks @cperciva). ## 3. Xen HVM 3.0 This row shows a more recent hypervisor configuration, running on a processor with hardware virtualization for CPUs and memory (VT-x), and using paravirt drivers for network and storage devices. The AMI and boot are now HVM. Interrupts and timers haven't been paravirtualized yet. I also started coloring "Motherboard and Boot" green, because instances of this type can boot faster than a bare metal machine can, despite software virtualization. ## 4. Xen HVM 4.0.1
--> This configuration boots HVM and uses PVHVM drivers. Those drivers are also called [PV on HVM], which are paravirt drivers that use HVM features. While this is using an HVM AMI, to help differentiate it, I call these instances "PVHVM", after the drivers. This improves some kinds of workloads including interrupts and timers. Things started to become confusing here for two reasons. Instead of glossing over these details, let me dig in: The first source of confusion was the AMI types. AWS EC2 uses a different image type and boot process for PV and HVM, as described on the [Linux AMI Virtualization Types] page. People then began referring to the running instances as PV or HVM, but it's more complex than that, because HVM can boot and then run paravirt drivers (PV), and can also run paravirt on HVM Drivers (PVHVM). Most (or all) of the "HVM" instances we use on EC2 are HVM with PVHVM drivers. The second source of confusion was performance. Earlier "HVM" versions didn't use as much paravirt, and could even be slower than "PV," causing many to recommend "PV" over "HVM." This recommendation quickly became out of date. It's all a bit confusing, and I wrote about this in 2014: [Xen Modes]. When I joined Netflix in 2014, we had begun transitioning from PV to HVM (PVHVM). I came from the world of containers, and expected to be busy wrestling with the overheads of hardware virtualization and details such as PV vs PVHVM, but I found that workloads were already running pretty fast on these instances. This was because most of our workloads are CPU and memory bound, which were already virtualized in hardware. But not all workloads: some are network bound (proxies) and storage bound (databases). ## 5. Xen AWS 2013 Starting in 2013, some EC2 instance types began supporting hardware virtualization for network interfaces: Single Root I/O Virtualization (SR-IOV). The first was c3. AWS called this [enhanced networking]. This was initially used via the ixgbe driver for speeds up to 10 Gbps, then the ena driver for speeds up to 25 Gbps. My colleague Amer Ather was responsible for testing it and making it work at Netflix, and posted some of his results: [2 Million Packets Per Second on a Public Cloud Instance]. Impressive! With a solution in hand for network performance, the next target was storage. ## 6. Xen AWS 2017 In 2015, AWS launched c4, which used hardware virtualization for EBS volumes. This was extended to instance storage devices for the x1.32xlarge in 2016. Starting in 2017 with the [i3 instance type], AWS EC2 began supporting hardware virtualization for storage devices: using SR-IOV and the nvme storage driver. Amer has tested and deployed this as well, and shared some results: [3 Million Storage IOPS on AWS Cloud Instance]. This is a great development, and I've been working on a blog post to describe these new instance types: with hardware virtualization for CPU, memory, network, *and* storage. Before I could even finish this post, AWS launched Nitro. ## 7. AWS Nitro 2017
--> As was announced at AWS re:Invent last night, and covered in Anthony Liguori's talk today ([CMP332]: video), and the bare metal talk ([CMP330]: video), the [c5 instance type] uses a new hypervisor called Nitro. Nitro is lightweight. It is based on the KVM core kernel module, but does not use many of the other KVM components, such as QEMU. There is also no domO or IDD involved in the I/O path. Direct metal access.

Before Nitro: I/O initialized via dom0 (CMP332)

After Nitro: Direct metal I/O access (CMP332)
The aim of Nitro is to provide performance that is "indistinguishable from metal." It not only uses SR-IOV for hardware virtualization of network and storage I/O (provided by custom silicon cards by annapurnalabs), but it also has hardware virtualization support of interrupts: using posted interrupts and APICv to reduce the number of VM exits. Improving interrupt performance has been described as the [last battleground] for hardware virtualization performance. As Anthony explained in his talk, these previous hardware virtualization developments were components of Nitro. They were being launched piecemeal, improving performance of pre-Nitro systems. Nitro is easier to use, as it uses all these technologies by default (that's why I left the "With" cell blank). The c5 is pictured right is EBS-only, so that diagram doesn't show direct metal access for ephemeral drives (which we'll see in other Nitro instances). I've been investigating the overhead of Nitro, and have so far found it to be miniscule, often less than 1% (it's hard to measure). Nitro's performance is near-metal. I'm also excited about Nitro because it exposes all PMC counters. I previously posted [The PMCs of EC2: Measuring IPC], covering the architectural set of PMCs that were recently exposed to certain instance types in EC2. That was only seven PMCs. On the c5 Nitro instances, you have hundreds of PMCs, and can truly analyze low-level CPU performance in detail. This should help find wins of 5%, 10%, and more. The c5s were the first to use Nitro, but also at re:Invent the [m5 instance type] was launched, also based on Nitro. AWS have said that eventually most (or all) instances will use the Nitro hypervisor, with the exception of the new Bare Metal instances. ## 8. AWS Bare Metal 2017 Also announced at AWS re:Invent: the Amazon EC2 Bare Metal instances, which are just that – bare metal servers. 0% performance overhead. Run whatever you like: Xen, KVM, containers. Access all PMCs, and other processor features. It was covered in detail today in a talk by Matthew Wilson and Aaron Blasius ([CMP330]: video). ## In summary I hope the diagram now makes more sense, and summarizes the virtualization development journey:
I'll leave you with one parting thought: the Bare Metal instance types are huge (eg, 72 CPUs), and you'll probably want to divide them up into cloud instances. Do you think you can setup a hypervisor to do that with <1% overhead? Even containers can have hidden overheads (like Docker's use of overlayfs or bridge networks). I'd personally find that a fun and interesting challenge, but I think it will be hard to beat Nitro. I imagine that, for most people, Nitro is exactly what they want. ## Acknowledgements & References My EC2 virtualization diagram began as a Xen modes diagram from Lars Kurth at LinuxCon EU and also published by George Dunlap on xenproject.org in 2012, which I developed further here in 2014, and now Lars is updating it on xen.org. In my new diagram, to show the importance of hardware virtualization, I've made that green, and paravirtualization is now blue. Thanks to Amer Ather of Netflix for help and input, and from AWS, Matt Wilson, Anthony Liguori, Greg Dunn, and others, for their work on virtualization and helping me understand how it all works. Thanks to Deirdré Straughan for edits. Disclaimer: this is not an officially-endorsed AWS post. More reading about these virtualization topics: - https://wiki.xen.org/wiki/Xen_Project_Software_Overview - (older) https://wiki.xenproject.org/wiki/Virtualization_Spectrum - https://wiki.xenproject.org/wiki/Understanding_the_Virtualization_Spectrum - https://wiki.xen.org/wiki/PV_on_HVM - http://www.brendangregg.com/blog/2014-05-07/what-color-is-your-xen.html - http://www.brendangregg.com/blog/2014-05-09/xen-feature-detection.html - http://techblog.cloudperf.net/2016/05/2-million-packets-per-second-on-public.html - http://techblog.cloudperf.net/2017/04/3-million-storage-iops-on-aws-cloud.html - https://www.slideshare.net/AmazonWebServices/sdd406-amazon-ec2-instances-deep-dive-aws-reinvent-2014 - Where I introduced this diagram: https://www.slideshare.net/brendangregg/how-netflix-tunes-ec2-instances-for-performance - AWS re:Invent 2017: Amazon EC2 Bare Metal Instances (CMP330): https://www.youtube.com/watch?v=o9_4uGvbvnk - AWS re:Invent 2017: C5 Instances and the Evolution of Amazon EC2 Virtualization (CMP332): https://www.youtube.com/watch?v=LabltEXk0VQ [Xen Modes]: http://www.brendangregg.com/blog/2014-05-07/what-color-is-your-xen.html [enhanced networking]: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html [2 Million Packets Per Second on a Public Cloud Instance]: http://techblog.cloudperf.net/2016/05/2-million-packets-per-second-on-public.html [3 Million Storage IOPS on AWS Cloud Instance]: http://techblog.cloudperf.net/2017/04/3-million-storage-iops-on-aws-cloud.html [CMP332]: https://www.portal.reinvent.awsevents.com/connect/sessionDetail.ww?SESSION_ID=16761 [CMP332 - C5 Instances and the Evolution of Amazon EC2 Virtualization]: https://www.portal.reinvent.awsevents.com/connect/sessionDetail.ww?SESSION_ID=16761 [last battleground]: http://compas.cs.stonybrook.edu/~mferdman/downloads.php/VEE15_Comprehensive_Implementation_and_Evaluation_of_Direct_Interrupt_Delivery.pdf [Linux AMI Virtualization Types]: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtualization_types.html [The PMCs of EC2: Measuring IPC]: http://www.brendangregg.com/blog/2017-05-04/the-pmcs-of-ec2.html [c5 instance type]: https://aws.amazon.com/blogs/aws/now-available-compute-intensive-c5-instances-for-amazon-ec2/ [i3 instance type]: https://aws.amazon.com/blogs/aws/now-available-i3-instances-for-demanding-io-intensive-applications/ [PV on HVM]: https://wiki.xen.org/wiki/PV_on_HVM [CMP330]: https://www.portal.reinvent.awsevents.com/connect/sessionDetail.ww?SESSION_ID=15939 [m5 instance type]: https://aws.amazon.com/about-aws/whats-new/2017/11/introducing-amazon-ec2-m5-instances/

micro:bit, bluetooth and Linux blog'o'less




 
After various efforts I was able to read micro:bit sensors data from Linux. While temperature, and accelerometer data is not too difficult to collect, the UART service took a little more time, mainly due to a known issue in the micro:bit BLE implementation.

Here you can find my observations and experiments, with sample Python scripts: https://github.com/alcir/microbit-ble

Small Business Accounting Software Woes /dev/dump

I'm so disappointed with the online accounting software options available to me; and I've spent far far too much time in the past couple of days looking for an accounting solution for my new business. The current state of affairs makes me wonder if just using a spreadsheet might be as easy.

I am posting my experiences here for two reasons.

  1. To inform others who might have similar needs, and
  2. To inform the hopefully smart people at these companies, so maybe they will improve their products.
Let me start with a brief summary of my needs:

That's it. Nothing super difficult, right?  You'd think there would be dozens of contenders who could help me.

You'd be wrong.

Here's what I looked at, and their deficiencies:

Freshbooks 



I really like most of what Freshbooks has to offer, and this was my starting point. Super easy to use, an integration with Gusto, and their invoicing solution is super elegant. Unfortunately, their lack of reconciliation and double-entry accounting (or any of the other "real" accounting stuff) disqualifies them. Adding to the problem, I already use them for my personal consulting business (where I've been a happy user), and they don't have support for multiple business on their "Classic Edition".

Then there is the whole confusion between "New Freshbooks" and "Classic Freshbooks".

This is a company that states they intend to continue to keep two separate software stacks, with about 90% functionality overlap, running ~forever. Why? Because they have some features (and some integrations) that they lack in the new one. (I've been informed that my use patterns indicate that I should stay on the "Classic" edition forever because of my heavy use of Time Tracking.) Some of us with real world software engineering experience know how costly and hateful it is to have multiple simultaneous versions of a product in production. Freshbook's approach here, with no plans to merge the functionality, is about the most boneheaded decision I've seen engineering management take.

Being stuck on the "Classic Edition" makes me feel like a loser, but really it's a sign that their own product is the loser.  I have to believe at some point one product or the other is going to be a dead end.

Quickbooks Online


This is a product that is well recommended, and probably one of the most widely used. It has so much capability. It also lacks the "hacked together by a bunch of different engineering teams that didn't talk to each other" feeling that their desktop product has. (Yes, I have experience with Quickbooks Pro, too. Sad to say.)  It's probably a good thing I can't look at their code behind the curtain.

The biggest, maybe even only, failing they have for my use case is their inability to bill against clients that are in a different currency. Wait, they are multicurrency capable, right?  Uh, no they aren't. If I can't record my billable hours against a client in another country in their preferred currency, then whatever you think your "multicurrency" support is doesn't count. I have international clients that demand billing in their local currency.  So this is a non-starter for me. This feature has been asked for before from them, and they have ignored it. Major, and honestly unexpected, fail.

Cost wise they aren't the cheapest, but this one feature absence is a show stopper for me, otherwise I'd probably have settled here.

Xero


Xero is another of the main companies, and in Gartner's magic quadrant as their leader in the sector. I didn't actually try them out -- though I did research. Their shortcomings for me were: price (multi-currency support requires me to pay $70 / month, which is about 2x all the others), and lack of time tracking. Sure, I can add an integration from some other company like Tsheets, for another $20 / month. But now this solution is like 3x the cost of everyone else.

One feature that Xero includes for that $70 is payroll processing -- but only for a handful of states (California is one), and I can't seem to find any reviews for folks who have used them.   If I want to use an outside company with a longer track record and broader coverage across states, like SurePayroll or Gusto or ADP, I will wind up paying double.

If Xero would change their menu somewhat (make it ala carte), we'd be able to work together. Let me integrate with Gusto, and not have to pay exorbitant fees for multi-currency support. Add time tracking and it would be even better.

Arguably I could stop being such a penny pincher, and just go with Xero + Tsheets or somesuch. Outside of the crazy expensive options for companies that can afford a full time accountant (Sage, NetSuite, looking at you!), this was the most expensive option.  I'd also have to use Xero's payroll service, and I'm not sure

ZipBooks


At first blush, ZipBooks looked like a great option. On paper they have everything I need -- they even partnered with Gusto, and claim to have multicurrency support.  Amazingly, they are even freeOf course if you elect to use some of their add-ons, you pay a modest fee, but from a pure price perspective, this looks like the cheapest.

Unfortunately, as I played with their system, I found a few major issues. Their multi-currency support is a bit of an inconvenient joke. They don't let you set a per-client currency. Instead you change the currency for the entire account, then generate invoices in that currency (or accept payments), then have to switch back to the home currency. This is account wide, so you better not have more than one person access the account at a time. The whole setup feels really hinky, and to be honest I just don't trust it.

Second, their bank integration is (as of today) broken -- meaning the website gives me conflict errors before I even can select a bank (I wanted to see if my business bank -- a regional smaller bank -- is on their list). So, not very reliable.

Finally, their support is nearly non-existent. I sent several questions to them through their on-line support channel, and got back a message "ZipBooks usually responds in a day". A day. Other companies I looked at took maybe 10-20 minutes to respond -- I still have not received a response from ZipBooks.

I need a service that supports real multicurrency invoicing, is reliable, and with reachable support. Three strikes for ZipBooks.  Damn, I really wanted to like these guys.

Kashoo


Kashoo was well reviewed, but I had some problems with them. First their only payroll integration is with SurePayroll. I hate being locked in, although I could probably overlook this. Second, they don't have any time tracking support. Instead they partner with Freshbooks, but only the "Classic Edition" (and apparently no plans to support the "New Freshbooks".)  A red flag.

And, that brings in the Freshbooks liability (only one company, so I can't have both my old consulting business and this new one on the same iOS device for example), and I'd have to pay for Freshbooks service too.

On the plus side, the Kashoo tech support (or pre-sales support?) was quite responsive.  I don't think they are far off the mark.

Wave Accounting 


Wave is another free option, but they offer payroll (although full service only in five states) as an add-on.  (They also make money on payment processing, if you use that.)  Unfortunately, they lacked support for integrations, time tracking, or multi-currency support.  I'd like to say close but no cigar, but really in this case, it's just "no cigar".  (I guess you get what you pay for...)

Zoho Books


Zoho Books is another strong option, well regarded.  So far, it seems to have everything I need except any kind of payroll support.  I'd really love it if they would integrate with Gusto.  I was afraid that I would need to set up with Zoho Project and pay another service fee, but it looks -- at least so far from my trial, like this won't be necessary.

So my feature request is for integration with Gusto.  In the meantime, I'll probably just handle payroll expenses by manually copying the data from Gusto.

Conclusion


So many, so close, and yet nothing actually hits the mark.   (These aren't all the options I looked at, but they are the main contenders.  Some weren't offered in the US, or were too expensive, or self-hosted.  For now I'm going to try Zoho.  I will try to update this in a few months when I have more experience.

Updates: (As of Nov. 30, 2017) 


  1. Zoho has since introduced Zoho Payroll, and they contacted me about it.  It's only available for California at this time, and has some restrictions.  I personally don't want to be an early adopter for my payroll processing service, so I'm going to stick with Gusto for now.   Zoho's representative did tell me that they welcome other payroll processing companies to develop integrations for Zoho Books.   I hope Gusto will take notice.
  2. ZipBooks also contacted me.  They apologized for the delays in getting back to me -- apparently their staff left early for Thanksgiving weekend.  They indicated that they have fixed whatever bug caused me to be unable to link my bank account.  Their COO also contacted me, and we had a long phone call, mostly to discuss my thoughts and needs around multi-currency support.  I'm not quite ready to switch to them, but I'd keep a close eye on them.  They do need to work to improve their initial customer service experience, in my opinion.
  3. It looks like my own multi-currency needs may be vanishing, as my primary external customer has agreed to be billed in USD and to pay me in USD.  That said, I want to keep the option open for the future, as I may have other international customers in the future.
  4. None of the other vendors reached out to me, even though I linked to them on Twitter.  The lack of response itself is "significant" in terms of customer service, IMO. 

CBOR vs. JSON vs. libnvpair Josef "Jeff" Sipek

My blahg uses nvlists for logging extra information about its operation. Historically, it used Sun libnvpair. That is, it used its data structures as well as the XDR encoding to serialize the data to disk.

A few months ago, I decided to replace libnvpair with my own nvlist implementation—one that was more flexible and better integrated with my code. (It is still a bit of a work-in-progress, but it is looking good.) The code conversion went smoothly, and since then all the new information was logged in JSON.

Last night, I decided to convert a bunch of the previously accumulated libnvpair data files into the new JSON-based format. After whipping up a quick conversion program, I ran it on the data. The result surprised me—the JSON version was about 55% of the size of the libnvpair encoded input!

This piqued my interest. I re-ran the conversion but with CBOR (RFC 7049) as the output format. The result was even better with the output being 45% of libnvpair’s encoding.

This made me realize just how inefficient libnvpair is when serialized. At least part of it is because XDR (the way libnvpair serializes data) uses a lot of padding, while both JSON and CBOR use a more compact encoding for many data types (e.g., an unsigned number in CBOR uses 1 byte for the type and 0, 1, 2, 4, or 8 additional bytes based on its magnitude, while libnvpair always encodes a uint64_t as 8 bytes plus 4 bytes for the type).

Since CBOR is 79% of JSON’s size (and significantly less underspecified compared to the minefield that is JSON), I am hoping to convert everything that makes sense to CBOR. (CBOR being a binary format makes it harder for people to hand-edit it. If hand-editing is desirable, then it makes sense to stick with JSON or other text-based formats.)

The Data & Playing with Compression

The blahg-generated dataset that I converted consisted of 230866 files, each containing an nvlist. The following byte counts are a simple concatenation of the files. (A more complicated format like tar would add a significant enough overhead to make the encoding efficiency comparison flawed.)

Format Size % of nvpair
nvpair 471 MB 100%
JSON 257 MB 54.6%
CBOR 203 MB 45.1%

I also took each of the concatenated files and compressed it with gzip, bzip2, and xz. In each case, I used the most aggressive compression by using -9. The percentages in parentheses are comparing the compressed size to the same format’s uncompressed size. The results:

Format Uncomp. gzip bzip2 xz
nvpair 471 MB 37.4 MB (7.9%) 21.0 MB (4.5%) 15.8 MB (3.3%)
JSON 257 MB 28.7 MB (11.1%) 17.9 MB (7.0%) 14.5 MB (5.6%)
CBOR 203 MB 26.8 MB (13.2%) 16.9 MB (8.3%) 13.7 MB (6.7%)

(The compression ratios are likely artificially better than normal since each of the 230k files has the same nvlist keys.)

Since tables like this are hard to digest, I turned the same data into a graph:

CBOR does very well uncompressed. Even after compressing it with a general purpose compression algorithm, it outperforms JSON with the same algorithm by about 5%.

I look forward to using CBOR everywhere I can.

2017-11-14 Josef "Jeff" Sipek

Doug Engelbart Institute — Online exhibits, historic videos, texts, archive photos, and stories about Doug Engelbart, the inventor of the mouse, hypertext, and GUIs…all in the 1960s

Flight recorders data inspection by Airbus

Parsing JSON is a Minefield

Completely Painless Programmer’s Guide to XYZ, RGB, ICC, xyY, and TRCs — Brain-hurting amount of information about color profiles, etc.

darktable — A Lightroom-like open source software

World plugs — Info about every electric plug form factor in the world

TLS close-notify .... what were they thinking? /dev/dump

Close-Notify Idiocy?


TLS (and presumably SSL) require that implementations send a special disconnect message, "close-notify", when closing a connection.  The precise language (from TLS v1.2) reads:

The client and the server must share knowledge that the connection is
ending in order to avoid a truncation attack. Either party may
initiate the exchange of closing messages. 
close_notify 
This message notifies the recipient that the sender will not send
any more messages on this connection. Note that as of TLS 1.1,
failure to properly close a connection no longer requires that a
session not be resumed. This is a change from TLS 1.0 to conform
with widespread implementation practice. 
Either party may initiate a close by sending a close_notify alert.
Any data received after a closure alert is ignored. 
Unless some other fatal alert has been transmitted, each party is
required to send a close_notify alert before closing the write side
of the connection. The other party MUST respond with a close_notify
alert of its own and close down the connection immediately,
discarding any pending writes. It is not required for the initiator
of the close to wait for the responding close_notify alert before
closing the read side of the connection.

This has to be one of the stupider designs I've seen.

The stated reason for this is to prevent a "truncation attack", where an attacker terminates the session by sending a clear-text disconnect (TCP FIN) message, presumably just before you log out of some sensitive service, say GMail.

The stupid thing here is that this is for WebApps that want to send a logout, and don't want to wait for confirmation that logout had occurred before sending confirmation to the user.  So this logout is unlike every other RPC.  What...?!?

Practical Exploit?


It's not even clear how one would use this attack to compromise a system... an attacker won't be able to hijack the actual TLS session unless they already pwned your encryption.  (In which case, game over, no need for truncation attacks.)  The idea in the truncation attack is that one side (the server?) still thinks the connection is alive, while the other (the browser?) thinks it is closed.  I guess this could be used to cause extra resource leaks on the server... but that's what keep-alives are for, right?

Bugs Everywhere


Of course, close-notify is the source of many bugs (pretty much none of them security critical) in TLS implementations.  Go ahead, Google... I'll wait...  Java, Microsoft, and many others have struggled in implementing this part of the RFC.

Even the TLS v1.1 authors recognized that "widespread implementation practice" is simply to ignore this part of the specification and close the TCP channel.

So you may be asking yourself, why don't implementations send the close-notify ... after all sending a single message seems pretty straight-forward and simple, right?

Semantic Overreach


Well, the thing is that on many occasions, the application is closing down.  Historically, operating systems would just close() their file descriptors on exit().  Even for long running applications, the quick way to abort a connection is ... close().  With no notification.  Application developers expect that close() is a non-blocking operation on network connections (and most everywhere else)1.

Guess what, you now cannot exit your application without sending this, without breaking the RFC.   That's right, this RFC changes the semantic of exit(2).  Whoa.

That's a little presumptive, dontcha think?

Requiring implementations to send this message means that now close() grows some kind of new semantic, where the application has to stop and wait for this to be delivered.  Which means TCP has to be flowing and healthy.  The only other RFC compliant behavior is to block and wait for it flow.

What happens if the other side is stuck, and doesn't read, leading to a TCP flow control condition?  You can't send the message, because the kernel TCP code won't accept it -- write() would block, and if you're in a non-blocking or event driven model, the event will simply never occur.  Your close() now blocks forever.

Defensively, you must insert a timeout somehow -- in violation of the RFC.  Otherwise your TCP session could block forever.  And now you have to contemplate how long to hold the channel open?  You've already decided (for whatever other reason) to abort the session, but you now have to wait a while ... how long is too long?  And meanwhile this open TCP sits around consuming buffer space, an open file descriptor, and perhaps other resources....

A Bit of Sanity


The sensible course of action, treating a connection abort for any reason as an implicit close notification, was simply "not considered" from what I can tell.

In my own application protocols, when using TLS, I may violate this RFC with prejudice. But then I also am not doing stupid things in the protocol like TCP connection reuse.  If you close the connection, all application state with that connection goes away.  Period.  Kind of ... logical, right?

Standards bodies be damned.

1. The exception here is historical tape devices, which might actually perform operations like rewinding the tape automatically upon close(). I think this semantic is probably lost in the mists of time for most of us.

Google Traffic, iOS edition Josef "Jeff" Sipek

Several years ago, I wrote about how Google gets traffic information and how to turn off this location reporting on Android phones. Since then I’ve switched to iPhones. While I normally use the built-in Maps app, I keep Google Maps installed as a fallback—just in case.

I upgraded my phone recently and so I spent some time going through all the apps and making sure they worked and didn’t have more access than necessary. This is when I discovered that the Google Maps app for iOS defaults to collecting location information and sharing it with Google. Given my previous post, this isn’t really surprising.

Turning it off

Anyway, as with the Android post, here are the steps to limit Google’s collection of location information.

First of all, in Settings → Privacy → “Location Services”, I changed Google Maps’s permission to “While Using the App”. If I’m not using the app, then it doesn’t need to know where I am.

Second, in the app itself, go to: Settings → “About, terms & privacy” → “Location data collection”. That’s right, this setting is buried in what appears to be a page with the boring legal notices.

And then turn off the the toggle:

That should do it…at least assuming that Google honors the settings in its own app.

Brilliant Jerks in Engineering Brendan Gregg's Blog


Notice board at Ericsson, Stockholm (pic by DeirdreS)
Many of us have worked with them: the engineering jerk who is brilliant at what they do, but treats others like trash. Some companies have a policy not to hire them (eg, Netflix's "[No Brilliant Jerks]", which was one of the many reasons I joined the company). There's also the "[No Asshole Rule]", popularized by a bestselling book of this title, which provides the following [test]: Here's a test for you or your company: Would you tolerate a brilliant engineer who is also an asshole? (Or the more company-polite version: would you tolerate a brilliant jerk?) There are numerous articles and opinions on the topic, including [Brilliant Jerks Cost More Than They Are Worth], and It's Better to Avoid a Toxic Employee than Hire a Superstar. My colleague Justin Becker is also giving a talk at QConSF 2017 on the topic: [Am I a Brilliant Jerk?]. It may help to clarify that "brilliant jerk" can mean different things to different people. To illustrate, I'll describe two types of brilliant jerks: the selfless and the selfish, and their behavior in detail. I'll then describe the damage caused by these jerks, and ways to deal with them. **The following are fictional characters.** These are not two actual engineers, but are collections of related traits to help examine this behavior beyond the simple "no asshole rule." These are engineers who by default act like jerks, not engineers who sometimes act that way. ## Fictional Alice, the selfless brilliant jerk Alice is a brilliant engineer. Alice cares about the company. She is direct and honest. If she believes that an unpopular position is right for the company, Alice does not hesitate to voice it. She will even browbeat others to make her point, often coming across as mean-spirited. Alice would point out that she wasn't being mean, she was just stating what is right, and that being mean shouldn't hurt the company anyway. She has little empathy for the feelings of others, and sees little business value in having it. Alice is great at working individually on hard engineering problems. She gets along fine with her immediate team and manager, who understand her personality. She doesn't get on well with others whom she only meets occasionally. Outside of her team, Alice is known as a jerk, and people try to avoid working with her. Alice is great at fixing hard bugs, writing test suites, doing code merges, and other unglamorous work. If the company needs it done, she's happy to do it, and doesn't care much whether it furthers her own career. While selfless jerks can be a net positive for the company, they can become more effective if they learn that being kind results in greater productivity. This topic was covered in the [Be Kind] post by boz. Different companies may have different attitudes towards Alice: whether to tolerate her behavior or not (most reviewers of this post said "no," one said "it's a grey area"). Startups may tolerate Alice, for example, since the company is so small that everyone knows Alice and understands her personality. But I've described selfless jerks primarily for contrast with _selfish_ jerks. ## Fictional Bob, the selfish brilliant jerk Bob is a brilliant engineer. Bob cares about Bob. He is selfish, lacks empathy, and has delusions of grandeur. He believes that any behavior is justified that benefits himself, including abusing and exploiting others, for which he shows no guilt or remorse. He can be charming and charismatic to get his way, causing people to ignore or excuse his bad behavior. Below is a list of attributes that describe Bob, an extreme example of a brilliant engineering jerk. Not every brilliant jerk exhibits all of these behaviors, but I've seen each and every one of them firsthand. **Bob interrupts others, and ignores their opinions**. He believes that he is the most important person in the room, and has no interest in what others have to say, frequently interrupting them. He can monopolize conversations with long, exaggerated stories that flatter himself. Less-assertive engineers are effectively silenced, even if their opinions on the topic are the most valuable. **Bob only does work that benefits himself**. He can work well on hard engineering problems, but only works on those he enjoys, or that help his career or promote his own earlier work. He creates new projects and immediately claims credit, but leaves the dirty work of finishing them to others, and avoids responsibility if they fail. He is brilliant at convincing the company to let him do what he wants, even when that ignores market demand or his own past performance. He never seriously mentors or trains other staff – he does not see that as useful to his own career. **Bob bullies, humiliates, and oppresses individuals**. With non-technical people, he wins arguments by bamboozling them with irrelevant technical detail, making them feel dumb. With junior technical people, Bob likes to ridicule their ideas, letting everyone know how stupid they are, and how much smarter he is. When his technical specialties are needed, he makes people beg and grovel for his help, as another way to humiliate them. When others make mistakes, he enjoys shaming and mocking them with biting sarcasm and witty insults. He uses similar rhetoric in arguments, where he must always win, no matter the cost. **Bob engages in displays of dominance in front of groups**. Bob likes to show everyone how important he is by how much he can get away with, including sheer rudeness. He is late to meetings (on purpose), puts his feet up on the table, then looks at his phone or laptop while ignoring everyone around him. He sometimes makes obscene remarks in the office, bragging: "If anyone else said that, they'd be fired!". He also insists on having a better laptop/desktop/monitor than everyone else, to display his status. **Bob tries to assert authority over all areas of the company**, including those where he has no expertise at all. Areas he cannot control, he denigrates: eg, as an engineer, he will claim that marketing is stupid, useless, unnecessary, and that "anyone could do it." **Bob is negative**. He trash-talks other technologies, companies, and people behind their backs, always finding something negative to say. He elevates his own status by slamming other people. He also attacks technologies that either don't leverage his own prior work, or don't conform to his own beliefs. Other engineers avoid new technologies, for fear of damaging ridicule from Bob. **Bob manipulates and misleads**. Sometimes he misleads subtly, by presenting facts that are literally true in a way that is intentionally misleading. At other times he will simply lie, and do it with such confidence and assertiveness that he is almost always believed. He states his own preferences and opinions as facts. **Bob uses physical intimidation**. Bob glares at those he doesn't like, and may invade people's personal space. He may also use violent gestures such as slamming fists on desks. **A string of good employees have quit because of Bob**. Some engineers become fed up with Bob and quit. Talented engineers are driven out by Bob on purpose, to eliminate threats to his own status. Bob demonizes those who left, attributing past failures to them, and their successes to others who stayed, especially himself. In this way, he convinces management that losing those staff was good for the company, and stops them from realizing that the real problem is Bob. Some who have left, if asked, will cite other reasons for quitting, hoping to avoid becoming victims of Bob's smear campaigns. **Bob gives great talks – about himself**. Because he is a brilliant engineer and a great public orator, he is a popular speaker at technical events. In talks, he narrates self-enhancing stories and rewrites history to flatter himself, taking credit for other people's work – if not blatantly, then tacitly or by implication. He has a group of spellbound followers outside of the company who hero-worship and idolize him, and would love to work with him. He is well liked _from afar_. **Bob exploits junior engineers**: Bob finds junior engineers who admire his brilliance, and encourages them to do work that elevates Bob's ideas and projects, reflecting glory back onto Bob. They become so invested in helping Bob's career growth that they have none of their own. **Bob is a negative role model**. Bob can drag down the workplace or community by becoming a negative role model and having others imitate his behavior. Those who admire Bob become negative, bully others, and engage in similar personal attacks, hoping for his approval and to become Bob themselves. Others simply use Bob to excuse their own pre-existing bad behavior: if Bob can do it, so can I. **Some of Bob's coworkers become accomplices, and gaslight his abuse**. They were there when he attacked and humiliated others, and they did nothing – or laughed along, encouraging Bob to continue. Bob likes to surround himself with such enablers. They may be otherwise reasonable people who have yet to understand that what they are witnessing is abuse. They may publicly defend Bob and deny that abuse happened, or minimize it, gaslighting Bob's victims ("everyone's a jerk sometimes"). **Bob refuses to change**. Bob knows that his behavior hurts people, but "that's their problem." It bears repeating: I have seen each of these behaviors firsthand, from multiple brilliant jerks. Bob is an extreme fictional case who exhibits _all_ of these behaviors. One particular person may exhibit only some, without necessarily being a jerk. But, if you recognize many of these traits in a colleague – or in yourself – then, yes, you're probably dealing with a major-league jerk. Some reviewers of this post have said that Bob seems unbelievable: no one could be anywhere near that bad! They are fortunate not to have experienced a Bob, which is the lesson here: your understanding of a "brilliant jerk" may differ from that of someone who has actually worked with a Bob, or another severe type of jerk. ## The problems caused by brilliant jerks Problems caused by Alice, the selfless jerk, may include: - Alice hurts or offends some employees with her attitude. - Alice causes her team and manager to spend energy mending fences with others. - Alice's projects may be less successful, as others avoid working with her. - Alice discourages others from asking her questions, so her technical expertise is often wasted. But Bob, the selfish jerk, can cause these additional problems: - Bob silences many technical opinions, lowering the company's technical IQ. - Bob creates extra work for others who must fix his abandoned projects. - Bob demoralizes many staff, which hurts productivity. - Bob causes stress-related psychological and physical illness in his victims. - Bob causes some staff to occasionally skip work: increasing absenteeism. - Bob drives staff to quit, who will never come back. - Bob may strengthen the company's competition, who hire those who quit. - Bob makes it difficult to hire other good staff (word gets around). - Bob discourages customers and investors (word gets around). - Other staff devise processes to work around Bob, reducing the company's efficiency. - Other staff may sabotage Bob's work, which sabotages the company. - Bob inspires other staff to imitate his behavior, multiplying the problem. - Bob creates a hostile workplace environment: an invitation to lawsuits. Chapter 2 in "The No Asshole Rule" covers more details (although for general staff, not just engineers), and has instructions for calculating your TCA: the Total Cost of Assholes for your organization. ## Dealing with brilliant jerks There are two parts to this: helping the victims and staff who witness the behavior, and dealing with the jerks themselves. Both are big topics that I'll discuss here only briefly. At some companies, no one is telling Alice or Bob that their behavior is inappropriate. Everyone sees the bad behavior, but thinks it must be tolerated because Alice and Bob are so brilliant and valuable. Wrong! One important step a company can take is to explicitly adopt a "no brilliant jerks" policy. Netflix has such a policy as part of the culture slide deck, now a [memo], which reads:
On a dream team, there are no “brilliant jerks.” The cost to teamwork is just too high. Our view is that brilliant people are also capable of decent human interactions, and we insist upon that. When highly capable people work together in a collaborative context, they inspire each other to be more creative, more productive and ultimately more successful as a team than they could be as a collection of individuals.
– Netflix culture memo
This policy isn't some useless feel-good text from a nameless source: it was originally published on CEO Reed Hastings' [slideshare account]. To be effective, such a policy for your company may also need to come from your CEO. All Netflix candidates are told to read the culture deck (memo) when interviewing, and are told that, yes, we take it seriously. While this helps people realize that jerks should not be tolerated, it doesn't necessarily stop jerks from being hired in the first place: Bob is brilliant and charismatic and would probably pass the interview. However, he would then find himself at a company where his colleagues recognize his bad behavior as unacceptable, and are empowered to speak up about it. In over three years at Netflix, I've worked with zero brilliant jerks. The "no brilliant jerks" policy works, and it's been great. If we hired any in that time, they either changed their ways or left the company before I could interact with them. Some brilliant jerks can mend their ways: Alice might be motivated to change if she can be made to understand that her behavior is hurting the company, which she cares about. She should be encouraged to exercise empathy, and to leave others feeling positive and motivated to work harder, rather than demotivated. My colleague, Justin Becker, explores this in detail in his QCon talk, including the topic of emotional intelligence (EQ). In the next section I'll share an example. As for Bob: he should be told that his behavior hurts people and the company, and given the opportunity to change – but the reality is that he probably doesn't care. He firmly believes that "nice guys finish last," and, so far, being a jerk has worked well for him. His managers have the power to change that equation, because they control things that Bob wants: they allow him to work on his pet projects and to speak at events, they give him promotions and bonuses, and ultimately they let him keep his job.
"I'd rather have a hole in my organization than an asshole."
– Fred Wilson, Velocity NY 2013 keynote.
As a colleague/victim/witness, you should report Bob's behavior to management, but you probably shouldn't ask them outright to fire Bob (among other reasons, what if someday you were thought to be a Bob?). Give management information, but let them decide how to act. Actually firing a brilliant jerk is a complicated topic for a separate post, ideally written by a manager who has dealt with this. There's usually a process to follow, which unfortunately Bob may exploit to his advantage, showing improvement when needed to keep his job, but then reverting back to his bad old ways. He may also have convinced management that his technical skills and fame are so important that the company would fail without him. This isn't true, but fear may cause management to hesitate. For management to deal effectively with Bob, they must themselves be convinced that his behavior should not be tolerated, regardless of his brilliance. For some companies, that will require truly understanding the damage that Bob causes (listed above), to justify taking action. For companies like Netflix with an explicit "no brilliant jerks" policy, it's much easier for management to take action, as they don't need to convince anyone that jerks are a problem: that's already covered in company policy. Regular one-on-one meetings with staff, and scheduled skip-level meetings, should also help inform management about the damage jerks are causing. Netflix does this well: I have scheduled one-on-one meetings with my manager once every two weeks, their manager once a month, and their manager at least once a year. That's three levels of management I talk directly and in private with, without even having to ask for a meeting. We're also encouraged to give other employees direct and honest feedback, intervene if we see harassment, and escalate up to and including the CEO. As for public speaking: Bob draws power from being a public face of the company. Speaking events should be shared among staff who want to speak, and training can be made available to improve their skills (various companies offer this), so that Bob isn't the only good speaker. Conference organizers can also adopt a "no brilliant jerks" policy (some already do), and attendees can avoid conferences that host known jerks. If Alice needs to learn empathy, Bob needs to learn both empathy and to stop being selfish, and sharing public speaking or other rewarding projects is an example of the latter. ## When I acted like a jerk Many people sometimes act like Alice, and it can be easy to talk them out of it (Alice herself is harder, since it's her by-default behavior). I'll explain this with a story, this time of a moment when I acted like a jerk. Early in my career, an engineer at my company made a big mistake in my area of expertise, and sent an email that dodged responsibility and showed no path to fix it. I was furious and phoned the engineer: my intent was to make him realize that he'd made a big mistake, and put him on the right path. I was blunt, and told him off. I didn't enjoying doing so, but I felt I was doing a Good Thing for the company, and fixing a problem. A week later, his manager phoned me unexpectedly. He told me that he was aware of my phone call, and didn't think I was technically wrong, but did I know that the engineer has been demotivated and unproductive since I talked to him, and was it my intent to make his staff unproductive? No, of course not. The manager continued: do you think you could have told my engineer what you needed to, in a way that left them feeling positive and motivated to fix it? Sure, I probably can. Good. Always do that in the future, please. I did. The phone call lasted less than two minutes, and was immediately effective. I suspect the manager had done this before. Notice that he did not accuse me of being a jerk, rather, he posed two questions, which were basically: 1) are you intending to hurt the company?, and 2) are you able to act decently? There's only really one right answer to those questions. If he had just said "you are a jerk" I may have just replied "no, I'm not", but by asking questions instead, it put the onus on me to think about the answer, and triggered a moment of self reflection. ## Additional topics There are some additional topics I have not covered in detail here, but should mention: - **What about jerks in open source communities?** A good reference for this is the [no more rock stars] post, where the rock star described is pretty much Bob. See that post for the section on: How do we as a community prevent rock stars?. - **What about Bob as a manager?** Bob may seek and be offered promotions into management, and become even more damaging to the company. Bob the manager exploits and threatens his subordinates. That's a topic for another post. - **Does Bob sexually assault others?** Is Bob more likely to be a harasser due to delusions of grandeur and a sense of entitlement ([Al Capone theory]), or is he smart enough not to go that far, or, is he simply not that kind of jerk? I don't know. That's outside of my firsthand experience, so I didn't include it. - **What about those junior engineers?** The ones Bob exploits for his own gain. That's another big topic. Some related reading here, here, and here (from which I borrowed the words "reflecting glory"). - **Is Bob actually brilliant?** It's a little hard to tell, since Bob takes credit for the work of others. Bob also creates enemies in the industry, and some staff even sabotage Bob's work. In the long run, it hurts Bob's career. Not a brilliant result, really. - **What if Bob is pretending to be brilliant?** (Updated) The consequences for the company can be worse. I didn't explore this topic here, but it would be a character similar to Bob who isn't actually brilliant, but pretends to be, and has most people believing him. To quote from a HN comment: "The myth of "brilliant jerks" is harmful because it lets any jerk pretend he's doing it because he's brilliant, when chances are he's just afraid of being unmasked as mediocre.". - **Does anyone exist who is really as bad as fictional Bob?** Yes. Fortunately they are rare. One reviewer thinks that my post will not be effective unless I name such a real-life Bob as a concrete example. Maybe they are right, but I've avoided that here. This isn't about one Bob, it's about all selfish brilliant jerks. - **Should we publicly call out brilliant jerks in tech?** It's a complex topic. The book [Is Shame Necessary] does make the point that shame and humiliation have a legitimate place in society when they are natural consequences to abusive behavior, which is also discussed in this post. Calling out abusers may save future victims of abuse, so long as the calling out is proportional and not abusive itself. For victims of abuse, I could not make a blanket recommendation: I don't know your specific situation and how safe it is for you to speak up. I've been in this situation myself, and was facing an extreme threat to myself and my family, and I understand how risky it can be. - **How do I know if I'm the jerk?** If you always think that being right is all that matters, and don't consider your impact on teamwork or relationships, you might be an Alice or a Bob. If you think that hurting other people is simply doing what it takes to get ahead, then you might be a Bob. See the earlier sections for more characteristics, and my colleague's QCon presentation: [Am I a Brilliant Jerk?] ## Conclusion Should brilliant jerks be tolerated? To explore this, I described two fictional brilliant jerks: Alice, who is selfless, and Bob, who is selfish. This makes it clear that the behavior of selfish jerks, like Bob, should definitely not be tolerated. Bob can kill companies. When CEOs and VCs sometimes say that brilliant jerks may be worth it, I imagine they are thinking of Alice, a selfless jerk, and not Bob. (Alice is debatable.) Early on in my career, I supported brilliant jerks of any type and thought they were worth it. I was wrong. People had warned me about them, that their behavior was "not ok," but they never went into much detail as to why. I've shared many details here. I didn't figure this all out until seeing the behavior and damage firsthand. (I've not only experienced it, but I may have reached my lifetime dosage of asshole-rads.) Companies can adopt a "no asshole rule", or more politely, a "no brilliant jerks" policy. Colleagues may be genuinely conflicted about how to deal with Bob: on the one hand, he is a real jerk, but on the other he is a "high performer," so isn't it in the company's best interest to tolerate his behavior? A policy helps you decide, and it can be as simple as three words: no brilliant jerks. ## Acknowledgements Picture and editing by Deirdré Straughan. Thanks to review feedback and suggestions from Alice Goldfuss, Baron Schwartz, Ed Hunter, Justin Becker, David Blank-Edelman, Valerie Aurora, and others. References and related reading: - Sutton, R. The No Asshole Rule. Grand Central Publishing, 2010. - Babiak, P., Hare, R. D. Snakes in Suits. Harper Business, 2007. - Jacquet, J. Is Shame Necessary. Random House, 2015. - Sutton, R. The Asshole Survival Guide. Houghton Mifflin Harcourt, 2017. - https://hbr.org/2015/12/its-better-to-avoid-a-toxic-employee-than-hire-a-superstar - https://retrospective.co/brilliant-jerks-cost-more-than-they-are-worth/ - http://boz.com/articles/be-kind.html - https://hypatia.ca/2016/06/21/no-more-rock-stars/ - https://www.youtube.com/watch?v=fJOSX-W0yHA&feature=youtu.be&t=10m53s - https://www.slideshare.net/reed2001/culture-1798664/36-Brilliant_Jerks_Some_companies_tolerate - https://jobs.netflix.com/culture - https://qconsf.com/sf2017/presentation/am-i-brilliant-jerk - https://hypatia.ca/2017/07/18/the-al-capone-theory-of-sexual-harassment/ - The blood bag series: part 1, part 2, part 3 - (Updated) discussion on hackernews [Am I a Brilliant Jerk?]: https://qconsf.com/sf2017/presentation/am-i-brilliant-jerk [test]: http://en.wikipedia.org/wiki/The_No_Asshole_Rule [Be Kind]: http://boz.com/articles/be-kind.html [Code of Conflict]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b0bc65729070b9cbdbb53ff042984a3c545a0e34 [No Brilliant Jerks]: https://www.slideshare.net/reed2001/culture-1798664/36-Brilliant_Jerks_Some_companies_tolerate [Brilliant Jerks Cost More Than They Are Worth]: https://retrospective.co/brilliant-jerks-cost-more-than-they-are-worth/ [abc]: http://www.recode.net/2017/2/22/14700114/is-uber-lost [No Asshole Rule]: http://amzn.to/2zitvVd [Al Capone theory]: https://hypatia.ca/2017/07/18/the-al-capone-theory-of-sexual-harassment/ [no more rock stars]: https://hypatia.ca/2016/06/21/no-more-rock-stars/ [memo]: https://jobs.netflix.com/culture [slideshare account]: https://www.slideshare.net/reed2001/culture-1798664/36-Brilliant_Jerks_Some_companies_tolerate [when is naming abuse itself abusive]: https://blog.valerieaurora.org/2016/10/24/when-is-naming-abuse-itself-abusive/ [Paul Graham]: https://youtu.be/UacbJ72dluU?t=4m18s [CEOs]: http://www.businessinsider.com/dick-costolo-on-why-companies-hire-brilliant-jerks-2017-5 [PCL-R]: https://www.sociopathicstyle.com/psychopathic-traits/ [Is Shame Necessary]: https://www.amazon.com/Shame-Necessary-New-Uses-Tool/dp/0307950131

CMake ExternalProject_add In Libraries /dev/dump

First off, I'm a developer of open source application libraries, some of which are fairly popular.

TLDR: Library developers should not use ExternalProject_Add, but instead rely on FindPackage, demanding that their downstream developers pre-install their dependencies.

I recently decided to try to add TLS v1.2 support to one of my messaging libraries, which is written in C and configured via CMake.



The best way for me to do this -- so I thought -- would be to add a dependency in my project using a sub project, bringing in a 3rd party (also open source) library -- Mbed TLS.

Now the Mbed TLS project is also configured by CMake, so you'd think this would be relatively straight-forward to include their work in my own.  You'd be mistaken.

CMake includes a capability for configuring external projects, even downloading their source code (or checking out the stuff via git) called ExternalProjects.

This looks super handy -- and it almost is.  (And for folks using CMake to build applications I'm sure this works out well indeed.)

Unfortunately, this facility needs a lot of work still -- it only runs at build time, not configuration time.

It also isn't immediately obvious that ExternalProject_Add() just creates the custom target, without making any dependencies upon that target.  I spent a number of hours trying to understand why my ExternalProject was not getting configured.  Hip hip hurray for CMake's amazing debugging facilities... notIt's sort of like trying to debug some bastard mix of m4, shell, and Python.  Hint, Add_Dependencies() is the clue you need, may this knowledge save you hours lack of it cost me.  Otherwise, enjoy the spaghetti.
Bon Apetit, CMake lovers!

So once you're configuring the dependent library, how are you going to link your own library against the dependent?

Well, if you're building an application, you just link (hopefully statically), have the link resolved at compile time, and forget about it forever more.

But if you're building a library the problem is harder.  You can't include the dependent library directly in your own.  There's no portable way to "merge" archive libraries or even dynamic libraries.

Basically, your consumers are going to be stuck having to link against the dependent libraries as well as your own (and in the right order too!)  You want to make this easier for folks, but you just can't. 
(My kingdom for a C equivalent to the Golang solution to this problem.  No wonder Pike et. al. got fed up with C and invented Go!)

And Gophers everywhere rejoiced!

Making matters worse, the actual library (or more, as in the aforementioned TLS software there are actually 3 separate libraries -- libmbedcrypto, libmbedx509, and libmbedtls) is located somewhere deeply nested in the build directory.   Your poor consumers are never gonna be able to figure it out.

There are two solutions:

a) Install the dependency as well as your own library (and tell users where it lives, perhaps via pkgconfig or somesuch).

b) Just forget about this and make users pre-install the dependency explicitly themselves, and pass the location to your configuration tool (CMake, autotools, etc.) explicitly.

Of these two, "a" is easier for end users -- as long as the application software doesn't also want to use functions in that library (perhaps linking against a *different* copy of the library).  If this happens, the problem can become kind of intractable to solve.

So, we basically punt, and make the user deal with this.  Which tests days for many systems is handled by packaging systems like debian, pkg-add, and brew.

After having worked in Go for so long (and admittedly in kernel software, which has none of these silly userland problems), the current state of affairs here in C is rather disappointing.

Does anyone out there have any other better ideas to handle this (I mean besides "develop in Y", where Y is some language besides C)?

Licensing... again.... /dev/dump

Let me start by saying this... I hate the GPL.  Oh yeah, and a heads up, I am just a software engineer, and not a lawyer.  Having said that....

I've released software under the GPL, but I never will again.  Don't get me wrong, I love open source, but GPL's license terms are unaccountably toxic, creating an island that I am pretty sure that original GPL authors never intended.


My Problem....


So I started by wanting to contemplate a licensing change for a new library I'm working on, to move from the very loose and liberal MIT license, to something with a few characteristics I like -- namely patent protection and a "builtin" contributor agreement.   I'm speaking of course of the well-respected and well-regarded Apache License 2.0.

The problem is, I ran into a complete and utter roadblock.

I want my software to be maximally usable by as many folks as possible.

There is a large installed base of software released under the GPLv2.  (Often without the automatic upgrade clause.)

Now I'm not a big fan of "viral licenses" in general, but I get that folks want to have a copy-left that prevents folks from including their work in closed source projects.  I get it, and it's not an entirely unreasonable position to hold, even if I think it limits adoption of such licensed software.

My problem is, that the GPLv2's terms are incredibly strict, prohibiting any other license terms being applied by any other source in the project.  This means that you can't mix GPLv2 with pretty much anything else, except the very most permissive licenses.  The patent grant & protection clauses breaks GPLv2.  (In another older circumstance, the CDDL had similar issues which blocks ZFS from being distributed with the Linux kernel proper.  The CDDL also had a fairly benign choice-of-venue clause for legal action, which was also deemed incompatible to the GPLv2.)

So at the end of the day, GPLv2 freezes innovation and has limited my own actions because I would like to enable people who have GPLv2 libraries to use my libraries.  We even have an ideological agreement -- the FSF actually recommends the Apache License 2.0!  And yet I can't use it; I'm stuck with a very much inferior MIT license in order to let GPLv2 folks play in the pool.

Wait, you say, what about the GPLv3?  It fixed these incompatibilities, right?   Well, yeah, but then it went and added other constraints on use which are even more chilling than the GPLv2.  (The anti-Tivoization clause, which is one of the more bizarre things I've seen in any software license, applying only to equipment intended primarily "consumer premises".  What??)

The GPL is the FOSS movements worst enemy, in my opinion.  Sure, Linux is everywhere, but I believe that this is in spite of the GPLv2 license, rather than as a natural by product.  The same result could have been achieved under a liberal, or a file-based copyleft.

GPL in Support of Proprietary Ecosystems


In another turn of events, the GPL is now being used by commercial entities in a bait-and-switch.  In this scheme, they hook the developer on their work under the GPL.  But when the developer wants to add some kind of commercial capability and retain the source confidentially, the developer cannot do that -- unless the developer pays the original author a fee for a special commercial license.    For a typical example, have a look at the WolfSSL license page.

Now all that is fine and dandy legal as you please.  But, in this case, the GPL isn't being used to promote open source at all.  Instead, it has become an enabler for monetization of closed source, and frankly leads to a richer proprietary software ecosystem.  I don't think this was what the original GPL authors had intended.

Furthermore, because the author of this proprietary software needs to be able to relicense the code under commercial terms, they are very very unlikely to accept contributions from third parties (e.g. external developers) -- unless those contributors are willing to perform a copyright assignment or sign a contributor agreement giving the commercial entity very broad relicensing rights.

So instead of becoming an enabler for open collaboration, the GPL just becomes another tool in the pockets of commercial interests.

The GPL Needs to Die

If you love open source, and you want to enhance innovation, please, please don't license your stuff under GPL unless you have no other choice.  If you can relicense your work under other terms, please do so!  Look for a non-viral license with the patent protections needed for both your and your downstreams.  I recommend either the Mozilla Public License (if you need a copyleft on your own code), or the Apache License (which is liberal but offers better protections over BSD or MIT or similar alternatives.)

Selecting relay smarthosts and using SMTP AUTH on illumos The Trouble with Tribbles...

A problem I looked at recently involved configuring a system to send (relay) email via a customer's own SMTP servers. There are 2 parts to this:


Search for SMTP AUTH with sendmail on illumos or Solaris, and you invariably end up with advice on how to build Cyrus SASL and sendmail from scratch.

For example, Andrew has some good instructions.

However, if you look at the sendmail we ship on illumos you'll find that it's already been built with SASLv2 support:

# /usr/lib/sendmail -bt -d0.1 < /dev/null
Version 8.14.4+Sun
 Compiled with: DNSMAP LDAPMAP LOG MAP_REGEX MATCHGECOS MILTER MIME7TO8
        MIME8TO7 NAMED_BIND NDBM NETINET NETINET6 NETUNIX NEWDB NIS
        PIPELINING SASLv2 SCANF STARTTLS TCPWRAPPERS USERDB
        USE_LDAP_INIT XDEBUG

And, if you telnet to port 25 and look at the EHLO response it includes:

250-AUTH GSSAPI DIGEST-MD5 CRAM-MD5

However, that's not actually the part we want here (but I'll come back to that later). I don't want to authenticate against my own server, I need my system to authenticate against a remote server.

Back to the problem at hand.

The first part - selecting the right smarthost - can be achieved using smarttable. All you need is the smarttable.m4 file, and then build a configuration using it by enabling the smarttable feature.

The second part, SMTP AUTH, should also be very simple. Again, it's all documented, and just involves enabling the authinfo feature. But wait - on illumos, there is no authinfo.m4 file, so that won't work.

In fact, it does. So what you need to do is to download the sendmail source, unpack it, and there in the cf/feature directory you'll find the authinfo.m4 file.

OK, so copy both files - smarttable.m4 and authinfo.m4 - into the /etc/mail/cf/feature directory on a server. Copy and edit the sendmail.mc file (i'm going to copy it to /tmp and edit it there) to add the 2 feature lines, like this fragment of the file here:

...
define(`confFALLBACK_SMARTHOST', `mailhost$?m.$m$.')dnl
FEATURE(`authinfo')dnl
FEATURE(`smarttable')dnl
MAILER(`local')dnl
...

Basically, just add the features above the MAILER line. Then compile that:

cd /etc/mail/cf/cf
m4 ../m4/cf.m4 /tmp/sendmail.mc > /tmp/sendmail.cf

That's your new sendmail.cf ready. It uses 2 databases in /etc/mail, to create these (initially empty):

cd /etc/mail
touch smarttable
touch authinfo
makemap hash smarttable < smarttable
makemap hash authinfo < authinfo

then copy your new sendmail.cf into /etc/mail and restart sendmail

cp /tmp/sendmail.cf /etc/mail
svcadm restart sendmail

So far so good, but what should those files look like?

First the smarttable file, which is just a map of sender to relay host. For example, it might just have:

my.name@gmail.com smtp.gmail.com

Which means that if I want my home system to send out mail with my address on it, it should route it through gmail's servers rather than trying to deliver it direct (and likely getting marked as spam).

Then the authinfo file, which looks like

Authinfo:smtp.gmail.com "U:root" "I:my.name@gmail.com" "P:mypassword" "M:LOGIN PLAIN"
Authinfo:smtp.gmail.com:587 "U:root" "I:my.name@gmail.com" "P:mypassword" "M:LOGIN PLAIN"
(There are just 2 lines there, starting with Authinfo:, even if the blog shows it wrapped.)

Basically, for gmail, you need to supply your email address as the identifier and your password as, well, the password. (Note: if you've got two-factor authentication set up, you'll need to set up an app key.)

Of course, the authinfo files ought to to readable only by root, otherwise anyone one your system can read your password in the clear.

There are a couple of non-standard tweaks you'll need for gmail to work. First, you need to go to your gmail account settings and allow less secure apps. Second, you will need the "M:LOGIN PLAIN" entry in the authinfo file, else you'll get an "available mechanisms do not fulfill requirements" error back.

Redo the two makemap commands above and you're good to go.

That's SMTP AUTH the one way. At which point you're probably thinking, can we authenticate against an illumos sendmail using SMTP AUTH?

The answer, sadly, is no. At least as far as I can tell. While our sendmail is built correctly against SASLv2, illumos doesn't seem to ship enough supporting bits of the SASL infrastructure to make this work. You should be able to create the file /etc/sasl/Sendmail.conf to configure it. Unfortunately the only pwcheck_method available is auxprop (using shadow, which would allow you to authenticate against local system accounts, isn't available; neither is saslauthd, and there's no saslauthd anyway). Worse, illumos has no auxprop plugins, so the whole thing is rather useless. Note that rebuilding sendmail alone won't fix this, as the problem is in the underlying sasl implementation.

The above notes were developed on Tribblix, but ought to apply to any illumos distribution using the vanilla illumos sendmail+sasl combination.

IPMItool for Termux (Android / Chromebook) Nahum Shalman

This is a follow up to my previous post outlining my chromebook setup.

I managed to get IPMItool to compile in my Termux environment. Here's how you can too. (So far I've only been able to get this to work on an x86 Chromebook, but not on ARM phones.)

apt update  
apt upgrade  
apt install coreutils  
pkg upgrade  
pkg install termux-tools proot util-linux net-tools  
pkg install openssh tracepath tree git  
termux-setup-storage  
termux-chroot  
pkg install automake autoconf libtool sed grep clang openssl-dev readline-dev  
git clone https://git.savannah.gnu.org/git/gnulib.git  
git clone https://github.com/nshalman/ipmitool-source  
cd ipmitool-source  
./bootstrap-gnulib
./bootstrap
./configure
make  

Building illumos-gate on AWS The Trouble with Tribbles...

Having talked about running Tribblix on AWS, one of the things that would be quite neat would be to be able to build illumos-gate.

This is interesting because it's a relatively involved process, and might require proper resources - it's not really possible to build illumos inside VirtualBox, for instance, and many laptops don't run illumos terribly well. So it's hard for the average user to put together a decent - most likely dedicated - rig capable of building or developing illumos, which is clearly a barrier to contribution.

Here's how anyone can build illumos, using Tribblix.

Build yourself an EC2 instance as documented here, with 2 changes:

  1. The instance type should be m4.large or bigger - m4.xlarge or c4.xlarge would be better. The bigger the instance, the quicker the build, but m4.large is pretty much the minimum size.
  2. Attach an EBS volume to the instance, at least 8G in size. If you want to do multiple builds, or do lint or debug builds, then it has to be larger. I attach the volume as /dev/sdf, which is assumed below. (You could keep the volume around to persist the data, of course.)
Once booted, log in as root. You then need to set up the zfs pool (the disk showing up as c2t5d0 below matches the /dev/sdf attachment point) and create a couple of file systems that can be used to host the build zone and store the build.

zpool create storage c2t5d0
zfs set compression=lz4 storage
zfs destroy rpool/export/home
zfs create -o mountpoint=/export/home storage/home
zfs create -o mountpoint=/export/zones storage/zones

You should then do an update to ensure packages are up to date, and install the develop overlay to get you some useful tools.

zap refresh
zap update-overlay -a
zap install-overlay develop

Then create a user, which you're going to use to do the build. For me, that is:

groupadd -g 10000 it
useradd -g it -u 11730 -c "Peter Tribble" -s /bin/tcsh \

  -d /export/home/ptribble ptribble
mkdir -p /export/home/ptribble
chown -hR ptribble:it /export/home/ptribble
passwd ptribble

Then create a build zone. It has an IP address, just pick any unused private address (I simply use the address above that of the global zone, which you can get with ifconfig or from the AWS console - note that it's the private address, not the public IP that you ssh to).

zap create-zone -z illumos-build -t whole \
  -i 172.xxx.xxx.xxx -o develop \
  -O java -O illumos-build -U ptribble

What does this do? It creates a new zone, called illumos-build. It's a whole root zone, with its own exclusive set of file systems. The IP address is 172.xxx.xxx.xxxx. The develop overlay is installed (in this case, copied from the global zone); the java and illumos-build overlays are added to this new zone (note the upper-case -O here). Finally, the user account ptribble is shared with the zone.

Give that a few seconds to boot and log in to it, then a couple of tweaks that are necessary for illumos to build without errors.

zlogin illumos-build
rm /usr/bin/cpp
cd /usr/bin ; ln -s ../gnu/bin/xgettext gxgettext

Now log out and log back in to the instance as your new user. We're going to create somewhere to store the files, and check out the source code.

mkdir Illumos
cd Illumos
git clone git://github.com/illumos/illumos-gate.git
wget -c \
  https://download.joyent.com/pub/build/illumos/on-closed-bins.i386.tar.bz2 \
  https://download.joyent.com/pub/build/illumos/on-closed-bins-nd.i386.tar.bz2

Now we set up the build.

cd illumos-gate
bzcat ../on-closed-bins.i386.tar.bz2 | tar xf -
bzcat ../on-closed-bins-nd.i386.tar.bz2 | tar xf -
cp usr/src/tools/scripts/nightly.sh .
chmod +x nightly.sh

There are two more files we need. Go to the tribblix-build repo and look in the illumos directory there. Grab one of the illumos.sh files from there and put it into your illumos-gate directory with the name illumos.sh. If you need to change how the build is done, this is the file to edit (but start from one of those files so you get one appropriate for Tribblix as the host). Also, grab Makefile.auditrecord and use it to replace usr/src/cmd/auditrecord/Makefile.

Now log in to the zone and start the build.

pfexec zlogin -l ptribble illumos-build
cd Illumos/illumos-gate
time ./nightly.sh illumos.sh

On an m4.xlarge instance, this took me just under 75 minutes. Look in the log directory and check that the mail_msg looks clean without errors, and you'll have the built files in the proto directory and an IPS repo under packages.

For more behind the scenes details on the illumos build process itself, look at the how to build illumos page.

Public Tribblix AMI now available The Trouble with Tribbles...

There's now a public Tribblix AMI available to run on AWS.

This was built according to the notes I gave earlier. And is part of making Tribblix the illumos for everyone.

This is to be considered slightly experimental, and there are a couple of constraints:

First, the AMI is only available in the London region for now (I'm in the UK, so that's where I'm running things). I could make it available elsewhere, but there are costs associated with doing so and, as everything related to Tribblix comes out of my own pocket, I'm not going to incur costs unless there's a demonstrable need. If you want to run in a different region, then you can always copy the AMI.

Second, the size of the image is quite small. Again, there's a constraint on cost. But the idea here is that you wouldn't store any non-trivial data in the image itself - you would create an appropriately sized EBS volume, attach that and create a zfs pool for your data. The Tribblix repo server does just that - the package repo lives on the second pool.

So, how to use this? I'm going to assume some level of AWS familiarity, that you have an account and know basically how to use AWS, and that your account is set up with things like an ssh key pair.

Go to the AWS console, and navigate to the EC2 dashboard. Unless you've copied the AMI to the region of your choice, make sure you're working in London - the dropdown is in the top right:


Then hit the launch instance button:


Now you get to choose an Amazon Machine Image (AMI). Click on "Community AMIs" and enter "Tribblix" or "illumos" into the "Search community AMIs" search box. At the time of writing, you'll only get one result, but more may appear in future:



OK, go and select that one. Then you can Choose an Instance Type. A great thing about Tribblix is that it's pretty lightweight, so the t2.micro - available on the free tier - is a good choice.



Click on "Review and Launch". On the next screen you can edit the storage to add an additional volume, but the one thing you must do is edit the security group.



If you leave it like that, you'll have no way to access it. So Edit it, and the simplest thing to do at this point is to create a new security group that allows ssh only, with the source being your own IP address, which you can get by selecting "My IP" from the source dropdown.



(I've got a saved security group that does just that, to let me straight in from home.)

Click on "Review and Launch" to go back to the main screen, and then "Launch". The is when you get to choose which key pair you can use to log in to your instance:



It will take a little while to start (although it's usually ready before the status checks say so), and you should then be able to ssh in to it (as root, with the key pair you set up).

ssh -i peter1-london.pem \
root@ec2-35-176-237-204.eu-west-2.compute.amazonaws.com

And you're good to go. What you do then is up to you; I'll cover some scenarios in upcoming posts. Be aware that the base AMI has a pretty minimalist set of packages installed, so you probably want to add some more packages or overlays to do anything useful.

EuroBSDcon: System Performance Analysis Methodologies Brendan Gregg's Blog

For my first trip to Paris I gave the closing keynote at [EuroBSDcon 2017] on performance methodologies, using FreeBSD 11.1 as an analysis target. In the past I've shared similar methodologies applied to other operating systems, and finished porting them to BSD for this talk. It was a few days of work, which is really not bad. That's a virtue of these methodologies: once you learn them, you can apply them to anything throughout your career, and it doesn't take too much time to re-apply them. The video is on youtube:

And the slides are on slideshare:
FreeBSD has an excellent range of analysis tools, and this was an opportunity to show them off. Among the new content I developed for the talk was a **FreeBSD performance checklist**: 1. ```uptime``` → load averages 1. ```dmesg -a | tail``` → kernel errors 1. ```vmstat 1``` → overall stats by time 1. ```vmstat -P``` → CPU balance 1. ```ps -auxw``` → process usage 1. ```iostat -xz 1``` → disk I/O 1. ```systat -ifstat``` → network I/O 1. ```systat -netstat``` → TCP stats 1. ```top``` → process overview 1. ```systat -vmstat``` → system overview I also developed a new tool to support my [thread state analysis] methodology on FreeBSD, [tstates.d]:
# ./tstates.d
Tracing scheduler events... Ctrl-C to end.
^C
Time (ms) per state (read script for info):
COMM             PID       CPU  RUNQ    SLP    USL   SUS   SWP   LCK   IWT   YLD
irq15: ata1      12          0     0      0      0     0     0     0 15024     0
[...]
sleep            877         0     0    505      0     0     0     0     0     0
bufdaemon        19          0    11      0  15057     0     0     0     0     0
sleep            879         0     0   2614      0     0     0     0     0     0
devd             523         0     0  15024      0     0     0     0     0     0
syncer           21          1     9      0  15055     0     0     0     0     0
fsck_ufs         878         1     0      0     10     0     0     0     0     0
fsck             836         1     0     12      0     0     0     0     0     0
dd               883         2     0      0      0     0     0     0     0     0
bufspacedaemon   20          3     5      0  15019     0     0     0     0     0
dtrace           873         3    23  15980      0     0     0     0     0     0
sh               881         4     0      3      1     0     0     0     0     0
csh              865         5     7  13882      0     0     0     0     0     0
rand_harvestq    6           8    20      0  15846     0     0     0     0     0
kernel           0          29    15      0      0     0     0     0     0     0
cam              4          45    14      0      0     0     0     0     0     0
sshd             863        52    85  13757      0     0     0     0     0     0
intr             12         79   192      0      0     0     0     0     0     0
cksum            876      1591   177      0    234     0     0     0     0     0
idle             11      14114  1902      0      0     0     0     0     0     0
This tool breaks down thread time into different states by tracing scheduler events (which can have noticeable overhead: measure in a lab environment before use). The states are: - **CPU**: on-CPU - **RUNQ**: Waiting on a CPU run queue - **SLP**: Interruptible sleep - **USL**: Uninterruptible sleep (eg, disk I/O) - **SUS**: Suspended - **SWP**: Swapped - **LCK**: Waiting for a lock - **IWT**: Waiting for an interrupt - **YLD**: Yield I added USL since the talk to split out disk I/O from the sleep state. The output above includes a ```sleep 0.5``` command, and a ```cksum```. EuroBSDcon was a great conference, and I had a lot of fun catching up with the BSD folk and meeting new people. If you missed my talk, you can see it online above, and I hope you find it useful. [EuroBSDcon 2017]: https://2017.eurobsdcon.org/about/ [thread state analysis]: http://www.brendangregg.com/tsamethod.html [tstates.d]: https://github.com/brendangregg/DTrace-tools/blob/master/sched/tstates.d