I have recently done some work on improving the MariaDB 10.4 package that is part of the OmniOS extra package repository, to add more features and to make it easier to deploy. Part of that work involved adding support for socket authentication which makes the default installation more secure.
Before you can create a sparse zone, the zone brand must be installed and you’ll need a ZFS dataset to act as a zone container. If you’ve used zones before, you might already have these in place.
gz# pkg install zones brand/sparse
gz# zfs create -o mountpoint=/zones rpool/zones
gz# zonecfg -z database database: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:database> create -t sparse zonecfg:database> set zonepath=/zones/database zonecfg:database> add net zonecfg:database:net> set physical=database0 zonecfg:database:net> set global-nic=switch10 zonecfg:database:net> set allowed-address=172.27.10.7/24 zonecfg:database:net> set defrouter=172.27.10.254 zonecfg:database:net> end zonecfg:database> add attr zonecfg:database:attr> set name=resolvers zonecfg:database:attr> set type=string zonecfg:database:attr> set value=188.8.131.52 zonecfg:database:attr> end zonecfg:database> add attr zonecfg:database:attr> set name=domain-name zonecfg:database:attr> set type=string zonecfg:database:attr> set value=omnios.org zonecfg:database:attr> end zonecfg:database> verify zonecfg:database> exit
gz# zoneadm -z database install A ZFS file system has been created for this zone. Image: Preparing at /zones/database/root. Sanity Check: Looking for 'entire' incorporation. Publisher: Using omnios (https://pkg.omnios.org/r151032/core). Publisher: Using extra.omnios (https://pkg.omnios.org/r151032/extra/). Cache: Using /var/pkg/publisher. Installing: Packages (output follows) Packages to install: 200 Mediators to change: 4 Services to change: 6 DOWNLOAD PKGS FILES XFER (MB) SPEED Completed 200/200 1476/1476 4.9/4.9 1.1k/s PHASE ITEMS Installing new actions 5869/5869 Updating package state database Done Updating package cache 0/0 Updating image state Done Creating fast lookup database Done Postinstall: Copying SMF seed repository ... done. Done: Installation completed in 56.395 seconds.
gz# zoneadm -z database boot gz# zlogin database
Check IP connectivity:
root@database:~# ipadm ADDROBJ TYPE STATE ADDR lo0/v4 static ok 127.0.0.1/8 database0/_a from-gz ok 172.27.10.7/24 lo0/v6 static ok ::1/128 root@database:~# ping google.com google.com is alive
root@database:~# pkg list -a '*mariadb*' NAME (PUBLISHER) VERSION IFO ooce/database/mariadb-103 (extra.omnios) 10.3.21-151032.0 --- ooce/database/mariadb-104 (extra.omnios) 10.4.11-151032.0 ---
root@database:~# pkg install mariadb-104 Packages to install: 2 Mediators to change: 1 Services to change: 3 Create boot environment: No Create backup boot environment: No Release Notes: -------------------------- MariaDB Installation Notes -------------------------- When the mariadb service is started for the first time, an initial database will be set up and two all-privilege accounts will be created. One is root@localhost, it has no password, but you need to be system 'root' user to connect. Use, for example, 'sudo mysql' The second is mysql@localhost, it has no password either, but you need to be the system 'mysql' user to connect. You may wish to review the default configuration file at /etc/opt/ooce/mariadb-<version>/my.cnf before starting the service for the first time. -------------------------- DOWNLOAD PKGS FILES XFER (MB) SPEED Completed 2/2 694/694 52.0/52.0 5.8M/s PHASE ITEMS Installing new actions 991/991 Updating package state database Done Updating package cache 0/0 Updating image state Done Creating fast lookup database Done Updating package cache 3/3
root@database:~# svcadm enable mariadb104 root@database:~# mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 8 Server version: 10.4.11-MariaDB OmniOS MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> select current_user() from dual; +----------------+ | current_user() | +----------------+ | root@localhost | +----------------+ 1 row in set (0.000 sec)
Socket authentication is in use by default, which can be checked by verifying that root has an invalid (non-matchable) password hash.
MariaDB [(none)]> select user, password from mysql.user where user != ''; +-------+----------+ | User | Password | +-------+----------+ | root | invalid | | mysql | invalid | +-------+----------+ 2 rows in set (0.001 sec)
I recently ran some benchmarks on a Threadripper 3960X system and the results were surprising me quite a bit. Simplified, the throughput the benchmark recorded went down, from 341 MB/s on a MBP to 136 MB/s on a Threadripper desktop. Prior I had read Daniel Lemire’s notes on the sub optimal performance for simdjson on Zen 2, which is heavily used in the benchmark, but the suggested drop were a few percent not half.
Long story short, this made me curious what caused this. First stop: perf.
Notice the first item? It is
crossbeam_channel::flavors::array::Channel<T>::recv. Oh my, I never saw that one hogging so much cpu time, in fact we spend more time in receiving from the channel then we spend in parsing or serializing JSON!
Lets add a bit of Threadripper trivia, the design AMD went with was splitting the CPU from a single silicon to multiple small dies, they call CCDs with in turn are consists of two CCXs that then contain the cores and level 1-3 cache. So lets look at another thing, htop (trusty little tool to show our load):
In this screenshot we can spot that one thread seems to be running on the 5th core, one on the 16th and one on the 19th and 20th. Thinking back to the design of the Threadripper this is a bit of a hint, those cores are on different CCXs and even further on different CCDs so what happens if they were on the same?
Boom 400+ MB/s!
taskset -c 0,1,2 does the trick, that’s a really nice improvement and looking at the perf output we can see
recv to move from nearly 11% of CPU time to
7.28%, now that’s a neat improvement. Not only is it nearly 3x faster then the first benchmark but also is it 20% faster then on the laptop. So far so good.
But it’s still leaves the question, why and if we can do something about this. Enter a little benchmark and look at what it puts out for the first core (it’s a lot of output otherwise).
B 0 - 0: - B 0 - 1: 818us/send B 0 - 2: 673us/send B 0 - 3: 2839us/send B 0 - 4: 2421us/send B 0 - 5: 2816us/send B 0 - 6: 3466us/send B 0 - 7: 3634us/send B 0 - 8: 3267us/send B 0 - 9: 3042us/send B 0 - 10: 3633us/send B 0 - 11: 3535us/send B 0 - 12: 3334us/send B 0 - 13: 3443us/send B 0 - 14: 3348us/send B 0 - 15: 3398us/send B 0 - 16: 3459us/send B 0 - 17: 3108us/send B 0 - 18: 3287us/send B 0 - 19: 3393us/send B 0 - 20: 3369us/send B 0 - 21: 3248us/send B 0 - 22: 3290us/send B 0 - 23: 3323us/send B 0 - 24: 487us/send B 0 - 25: 812us/send B 0 - 26: 676us/send B 0 - 27: 2859us/send B 0 - 28: 2853us/send B 0 - 29: 2864us/send B 0 - 30: 3475us/send B 0 - 31: 3620us/send B 0 - 32: 3582us/send B 0 - 33: 3497us/send B 0 - 34: 3524us/send B 0 - 35: 3488us/send B 0 - 36: 3331us/send B 0 - 37: 3303us/send B 0 - 38: 3365us/send B 0 - 39: 3333us/send B 0 - 40: 3324us/send B 0 - 41: 3363us/send B 0 - 42: 3554us/send B 0 - 43: 3351us/send B 0 - 44: 3207us/send B 0 - 45: 3240us/send B 0 - 46: 3377us/send B 0 - 47: 3275us/send
First things first, the numbers here are 0 indexed, unlike in htop where they’re 1 indexed. So core 0 here means core 1 in htop. The test runs only for a second per core combination (as it goes through all cores and otherwise takes a really long time), some variation is to be expected. That gets really slow really fast. We can see that core 24-47 are the SMTs cores on the physical cores 0-23, so 24 being the second thread on core 0. The second observation is that core 0-2 are in the same CCX, performance is reasonable fast here. 3-5 seem to be on the same CCD and so on.
Lets look at the code for the crossbeam channel. The part that’s interesting is that both
tail are wrapped in
CachePadded. Fortunately I have a friend who keeps going on about false sharing whenever performance becomes a topic so that was a really good hint here. Looking through the struct aligning head and tail to the cache line makes a lot of sense they’re frequently accessed from both sides of the queue but there is another part that’s frequently used on both sides. The buffer, and that is just an array of
T so it might not align well to the cache. In other words, if we access
buffer[x] we might invalidate
buffer[x+1] (or more). So what happens if we wrap the the elements in a CachePadded?. The result looks quite nice, it cut down by 50% when going over CCX boundaries:
B 0 - 0: - B 0 - 1: 630us/send B 0 - 2: 678us/send B 0 - 3: 1319us/send B 0 - 4: 1256us/send B 0 - 5: 1291us/send B 0 - 6: 1438us/send B 0 - 7: 1504us/send B 0 - 8: 1525us/send B 0 - 9: 1660us/send B 0 - 10: 1772us/send B 0 - 11: 1807us/send B 0 - 12: 1382us/send B 0 - 13: 1380us/send B 0 - 14: 1387us/send B 0 - 15: 1375us/send B 0 - 16: 1382us/send B 0 - 17: 1383us/send B 0 - 18: 1471us/send B 0 - 19: 1471us/send B 0 - 20: 1463us/send B 0 - 21: 1462us/send B 0 - 22: 1468us/send B 0 - 23: 1457us/send B 0 - 24: 466us/send B 0 - 25: 619us/send B 0 - 26: 671us/send B 0 - 27: 1438us/send B 0 - 28: 1422us/send B 0 - 29: 1514us/send B 0 - 30: 1789us/send B 0 - 31: 1688us/send B 0 - 32: 1812us/send B 0 - 33: 1820us/send B 0 - 34: 1719us/send B 0 - 35: 1797us/send B 0 - 36: 1383us/send B 0 - 37: 1364us/send B 0 - 38: 1373us/send B 0 - 39: 1383us/send B 0 - 40: 1370us/send B 0 - 41: 1390us/send B 0 - 42: 1468us/send B 0 - 43: 1467us/send B 0 - 44: 1464us/send B 0 - 45: 1463us/send B 0 - 46: 1475us/send B 0 - 47: 1467us/send
With all of this, the code went from 136 MB/s to over 150 MB/s when not pinned to cores, while this isn’t close to where I’d like to to be, it is a 10% improvement in throughput. And looking at perf again recv is completely gone from the list, which is nice!
This is the conclusion for now, if I have more interesting finds I’ll add a continuation - so I’ll keep digging.
OmniOS Community Edition weekly releases for w/c 23rd of December 2019 are now available.
For all supported OmniOS releases, OpenSSL 1.0 has been updated to 1.0.2u, which includes a security fix. This is expected to be the last update for the 1.0 series which reaches end of support on the 31st of December 2019.
OmniOS r151030 and above already include OpenSSL 1.1 as the default version with 1.0 libraries delivered alongside for backwards compatibility. The currently selected default version an be checked using
pkg mediator - check that 1.1 appears in the VERSION column.
# pkg mediator openssl MEDIATOR VER. SRC. VERSION IMPL. SRC. IMPLEMENTATION openssl vendor 1.1 vendor
To change the default version to 1.1, if necessary, use:
# pkg unset-mediator openssl
Additionally, for r151032 only:
For further details, please see https://omniosce.org/releasenotes
Any problems or questions, please get in touch.
OmniOS Community Edition weekly releases for w/c 2nd of December 2019 are now available.
The following updates are available for r151032 and r151030
Update Intel CPU Microcode to 20191115.
Fixes to support for large (> 2TB) USB hard disks.
mpt_sas driver could hang after config header request timeout.
OpenJDK updated to 1.8.0_232-09.
Additionally, for r151032 only:
KVM zones could lose network connectivity to other zones on the same machine.
Improvements to support for recent Linux distributions in lx zones.
Fixes for zfs diff between encrypted datasets.
8-bit colour modes did not work properly after boot.
Several updates and bug fixes for SMB.
make -C could cache wrong directory contents.
Fix (rare) crash if zone root cannot be mounted during boot.
For further details, please see https://omniosce.org/releasenotes —
Any problems or questions, please get in touch.
Over the summer, I described preparing for my next expedition. I’m thrilled to announce that the expedition is now plotted, the funds are raised, and the bags are packed: together with Steve Tuck and Jess Frazelle, we have started Oxide Computer Company.
Starting a computer company may sound crazy (and you would certainly be forgiven a double-take!), but it stems from a belief that I hold in my marrow: that hardware and software should each be built with the other in mind. For me, this belief dates back a quarter century: when I first came to Sun Microsystems in the mid-1990s, it was explicitly to work on operating system kernel development at a computer company — at a time when that very idea was iconoclastic. And when we started Fishworks a decade later, the belief in fully integrated software and hardware was so deeply rooted into our endeavor as to be eponymous: it was the “FISH” in “Fishworks.” In working at a cloud computing company over the past decade, economic realities forced me to suppress this belief to a degree — but it now burns hotter than ever after having endured the consequences of a world divided: in running a cloud, our most vexing problems emanated from the deepest bowels of the stack, when hardware and (especially) firmware operated at cross purposes with our systems software.
As I began to think about what was next, I was haunted by the pain and futility of trying to build a cloud with PC-era systems. At the same time, seeing the kinds of solutions that the hyperscalers had developed for themselves had always left me with equal parts admiration and frustration: their rack-level designs are a clear win — why are these designs cloistered among so few? And even in as much as the hardware could be found through admirable efforts like the Open Compute Project, the software necessary to realize its full potential has remained cruelly unavailable.
Alongside my inescapable technical beliefs has been a commercial one: even as the world is moving (or has moved) to elastic, API-driven computing, there remain good reasons to run on one’s own equipment! Further, as cloud-borne SaaS companies mature from being strictly growth focused to being more margin focused, it seems likely that more will consider buying machines instead of always renting them.
It was in the confluence of these sentiments that an idea began to take shape: the world needed a company to develop and deliver integrated, hyperscaler-class infrastructure to the broader market — that we needed to start a computer company. The “we” here is paramount: in Steve and Jess, I feel blessed to not only share a vision of our future, but to have diverse perspectives on how infrastructure is designed, built, sold, operated and run. And most important of all (with the emphasis itself being a reflection of hard-won wisdom), we three share deeply-held values: we have the same principled approach, with shared aspirations for building the kind of company that customers will love to buy from — and employees will be proud to work for.
Together, as we looked harder at the problem, we saw the opportunity more and more clearly: the rise of open firmware and the broadening of the Open Compute Project made this more technically feasible than ever; the sharpening desire among customers for a true cloud-like on-prem experience (and the neglect those customers felt in the market) made it more in demand than ever. With accelerating conviction that we would build a company to do this, we needed a name — and once we hit on Oxide, we knew it was us: oxides form much of the earth’s crust, giving a connotation of foundation; silicon, the element that is the foundation of all of computing, is found in nature in its oxide; and (yes!) iron oxide is also known as Rust, a programming language we see playing a substantial role for us. Were there any doubt, that Oxide can also be pseudo-written in hexadecimal — as 0x1de — pretty much sealed the deal!
There was just one question left, and it was an existential one: could we find an investor who saw what we saw in Oxide? Fortunately, the answer to this question had been emphatic and unequivocal: in the incredible team at Eclipse Ventures, we found investors that not only understood the space and the market, but also the challenges of solving hard technical problems. And we are deeply honored to have Eclipse’s singular Pierre Lamond joining us on our board; we can imagine no better a start for a new computer company!
So while there is a long and rocky path ahead, we are at last underway on our improbable journey! If you haven’t yet, read Jess’s blog on Oxide being born on a garage. If you find yourself battling the problems we’re aiming to fix, please join our mailing list. If you are a technologist who feels this problem in your bones as we do, consider joining us. And if nothing else, and you would like to hear some terrific stories of life at the hardware/software interface, check out our incredible podcast On the Metal!