port.lua
.
The coding style is starting to settle down lately and so it’s more
tempting for me to share little snippets of code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Reasonably easy to understand, I hope?
The features of this program that I would like to draw attention to are:
input
and output
objects are each one of our drivers for Intel 1G, Intel 10G, or Virtio ethernet devices. The “buffers” are blocks of physical memory that are directly used for hardware DMA.buffer.ref()
and buffer.deref()
.This my friends is an idea for how we could write high-speed packet networking code over the next decade or so. I hope to win you over to this way of thinking :-)
If you want to know more then check out the project homepage, browse the code on github, or browse an early draft of the book.
]]>SLIME at the Emacs Conference (16 min). History of the SLIME project and tips for people writing new Emacs-based IDEs.
Snabb Switch at the Swiss OpenStack User Group (8 min). Introducing the Snabb Switch project to people in the local OpenStack cloud computing community.
Snabb Switch at the EduPERT workshop (27 min). What networking problems can you address with x86 servers? and more on Snabb Switch.
Teclo Networks at ECLM 2011 (40 min). The early days of teclo.net the telecom startup company founded by Common Lisp hackers.
This has been really fun!
]]>We will make the address 192.168.100.1
act like 127.0.0.1
but route packets through a custom network topology before processing them.
First start with any custom topology. In this example: west
and east
are endpoints with an Open vSwitch bridge ovs
in between. (This would be great for applying OpenFlow rules to packets sent between local applications.)
1 2 3 4 5 6 7 8 9 10 |
|
Now assign addresses and routes for these interfaces. Packets sent to 192.168.100.1
should first be routed into interface west
then switched via ovs
and finally delivered to east
for processing.
1 2 3 4 5 |
|
The ingredients are in place but they don’t work yet. If you ping 192.168.100.1
then the packets are sent to lo
instead of being routed through the bridge.
And that brings us to the trick: Policy Routing.
First make Linux globally “forget” that these addresses are local.
1 2 |
|
Now packets sent to 192.168.100.1
do get routed down the right path. They are not processed at the other end though, because Linux does not remember they are local. We are half way there.
Next create separate routing tables strictly for when packets are received after they have traversed the switch. These tables remember that the addresses are local.
1 2 3 4 5 6 7 |
|
Now we are done!
If you connect to 192.168.100.1
then your packets will first traverse the bridge and then be processed locally. The setup is symmetric so that return traffic will be routed back through the bridge too. This will work with all your favourite programs like ping
, curl
, apache
, etc. Check it out by running tcpdump
on west
or east
.
Go ahead and create interesting virtual networks on your own machine.
]]>My lab makes me productive in society. I write really cool open source code for anybody to use, and I earn money by traveling around meeting interesting people and helping them to solve important problems. I have “hard fun” looking for ways to do these things in harmony.
My lab is where I can be creative. I can ask myself, after 25 years of thoughtful programming, what’s the right way to do fast ethernet I/O? and then develop my answer: a small 10G Ethernet device driver written in LuaJIT and embedded in the application. Fun!
My lab lets me work with friendly people in the open source world. I get to take part in the conversation on how to write networking software, and I can collaborate on equal footing with other clever hackers who have their own interests and labs, big and small.
My lab lets me buy fun toys (without asking permission :-)). Any day now I will receive a server with twenty 10G ethernet ports to share with everybody hacking on the Snabb Switch project. We get to break new ground together in the spirit of creative fun.
My lab lets me create software of enduring value. I do not have to constrain my code with secrecy and license restrictions, which means that it can take on a life of its own. The software can follow its own strong tendency to spread and thrive. I have just given a talk about SLIME at the first Emacs conference – 10-year-old work on a 37-year-old editor and still in widespread use!
My lab lets me choose to do all of these things, all by myself, and it would let me choose to do something completely different if that was what I wanted. It lets me decide when to work from my house in the Swiss Alps, a beach in Thailand, my family’s home in Australia, a cosy cafe in a big city, or a friendly office. It makes me abundantly wealthy in freedom and independence.
In summary I feel that I am close to a local maximum for creativity and freedom.
The problem has never been to “make” the company a success, but rather to preserve the great situation that I already had from the first day. On the one hand I need to avoid things like running out of money, selling shares, and signing employment contracts, and on the other hand I need to keep writing cool open source software and finding people with important problems I can help with. This way I can keep on being proud of my company.
A man is a success if he wakes up in the morning and goes to bed at night and in between does what he wants to do.- Bob Dylan
Know what I mean? I would be happy to hear from you on luke@snabb.co.
]]>I had a lovely twitter conversation with Dimitri Fontaine, Tony Finch, and Jan Lehnardt. I started out wanting to recommend a data structure to @rahul-mr for easily garbage collecting Snabb Switch’s ethernet forwarding table. Trouble is, I don’t know the name of the data structure, so I posted this implementation in a gist and asked if anybody knows what it’s called. Here’s the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
And the twitterverse helped me rewrite this much more concisely:
1 2 3 4 |
|
which I find rather aesthetically satisfying.
Rewriting code more concisely is one of my favorite activities. Lisp is my usual tool of choice for this purpose, so I tried a translation just for fun:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
and I found it interesting that the Lua version is so much more compact than the Lisp version. Sure, I’ve compacted it with whitespace tweakery and so on, but each version is as concise as I feel comfortable making it. So I wonder if Lua is becoming my preferred vehicle for writing pseudo-pseudo-code and indulging in cutenesses?
Then having cuteness cross my mind I couldn’t help but think back to cute code I’ve worked on before. I wrote my own favorite bit of cute production code in the OLPC firmware HD Audio driver. You see, the firmware is allowed to hard-code knowledge of the physical design of the laptop and motherboard. This bit of firmware code tells the audio chip explicit details such as the size and color of each physical audio jack in the laptop. The chip can later provide this information to the operating system for presentation to a user, for example in an audio mixer application.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The code reads a bit like english - 1/8” pink left mic-in jack - but is actually purely imperative Forth code that was inspired by the wonderful book Thinking Forth.
So it goes!
P.S. I never did work out the canonical name of the data structure. Please drop me an email on luke@snabb.co
if you know.
Here are the most notable features of the design:
This design suits Snabb Switch very well. In the future I foresee support for more Intel NICs, more advanced NIC features, more operating systems, and more hardware families. This will be a fun challenge over the months and years ahead.
Self-reliance FTW!
]]>Here is a selection of the many kernel-bypass solutions that are available:
These products each take their own design approaches and it’s interesting to consider choices that they make.
If you are developing high-speed (10+ Gbps) networking applications then you should seriously consider using one of these solutions. If you are an expert on one of these solutions then please tell us about it on the Snabb Switch Reddit!
]]>I have been optimizing the LuaJIT selftest code to transmit ethernet packets in a loop. I am pretty encouraged by the performance that I see: 3.1% CPU utilization on a low-end Hetzner EX6 machine to saturate a 1Gbps ethernet port with tiny packets. That is 28 nanoseconds of CPU time per packet.
I hope the details will be interesting. It is not so often that people write about low-level networking in high-level dynamic programming languages, is it? So, to give a quick taste, the driver source code is in intel.lua and the selftest main loop works like this:
1 2 3 4 5 6 7 8 |
|
and here is what that means:
tx_load
).add_txbuf
).flush_tx
).get_time_ns
).This is all accomplished by directly controlling the NIC using memory-mapped register I/O and DMA with shared memory. The only operating system calls here are to sleep and check the time.
This is a really fun sort of programming to be doing!
Going forward I am really excited to see how much of a production quality Ethernet switch can be written in a high-level dynamic programming language, and how neatly any parts that are ultimately written in C can be integrated into the whole. This is an open source project and you are welcome to join in the fun too!
(Comments welcome on the Snabb Switch Reddit.)
]]>Stuart Bailey’s talk was heartwarming. He’s an Erlang guy who’s finally meeting other Erlang people in the flesh. “Honey, you wouldn’t believe it, I can talk about Erlang here and they don’t look at me like I’m crazy.” I think it’s a beautiful moment that many of us can relate to.
Amazingly, Lisp code was on screens all over the place with no fuss being made whatsoever. This was all Clojure. Rich Hickey was there. I’d heard him speak once before, five years ago at the Lisp 50th anniversary event at OOPSLA, where everybody looked to him as the great hope to give Lisp a fresh start. Looks to me like he has delivered on what he promised. Great work Rich! That is no small feat.
I met a lot of friendly and interesting people. Haskell hackers, Xen hackers, even another Queenslander like me. I found the boyish enthusiasm of Simon Peyton Jones and Joe Armstrong very infectious, as always. I was also really glad to meet up with a lot of my old friends from the Stockholm Erlang scene.
The language runtime panel reminded me of one idea that’s been rattling around in my head forever. Take it as given that (a) the Erlang VM is great for concurrency because it gives you efficient process isolation and (b) hardware advances are making the Linux kernel’s process isolation more efficient every year. So when will it be time to start using the Linux kernel as an Erlang-like language runtime environment? If you strip away all the layers of crud on top, are we already there? This seems like valid research question.
I like the overall conference theme of helping to introduce niche ideas to a wider group of people. I am a bit outside the target demographic myself. Tech Mesh is full of ideas with ten thousand devotees trying to spread themselves to the next million. I am more comfortable with the smaller memes myself, when everybody thinks you are mad and you have to work hard to convince the first ten or hundred people that you’re not. That is why I am working on high-performance networking firmware in userspace with LuaJIT device drivers. I suppose everybody has their own ideal proximity to mainstream thinking and that every technology is a moving target in that respect.
The organization was great. Plentiful food, coffee, and other beverages. A bit chilly but hey, this is England. The venue was right near the British Museum so I finally saw the Rosetta Stone for the first time. Thanks everybody!
]]>My favorite piece of firmware is Openfirmware. This is shipped on a 1MB ROM chip in the OLPC XO instead of a traditional BIOS. It’s a complete and self-sufficient software environment that has been carefully crafted over the past few decades by super-hacker Mitch Bradley. If you have an OLPC XO and would like to learn about firmware then I can recommend studying Mitch’s Forth Lessons.
These days we can choose to deploy our applications as firmware, software, and increasingly as full-scale OS distributions – kilobytes, megabytes, or gigabytes. There is sometimes one style that’s clearly the best and other times several options are all reasonable choices.
Firmware is fun and refreshing to write. I have chosen to develop my new Snabb Switch project as firmware. I will post a lot more about the implications of this decision over time.
]]>I believe that making programs readable is one of the best and easiest ways to improve them.
I gather that it used to be a common practice to print program listings and read them. I hear about it in anecdotes from programmers I respect, and I also see that many older programs appear to have been written with printing and readnig in mind: they contain pagebreak characters; they include their user-documentation in comments; and they are broken into logical sections of a couple of pages or so.
Let me share some quotations that have stuck in my memory:
The mask layout program by BillCroft at Purdue EE department - This is a truly awesome C program that could do VLSI scale designs on a PDP-11. The implementation included the command processing, high-resolution graphics, and custom database. Amazingly the program was only about half an inch thick and could be read in an afternoon. (Contrast this to my own companies’ graphics drivers for the same device which ran ten times this for the drivers alone.)
I was the one who decided to rewrite the [program listing generator] from scratch as a standalone program, partly because I wanted to add substantial new facilities, such as the ability to list many files at once and provide inter-file cross-references.
Programming is, among other things, a kind of writing. One way to learn writing is to write, but in all other forms of writing, one also reads. We read examples - both good and bad - to facilitate learning. But how many programmers learn to write programs by reading programs? A few, but not many. And with the advent of terminals, things are getting worse, for the programmer may not even see his own program in a form suitable for reading. In the old days … programmers would while away the time by reading each others’ programs. Some even went so far as to read programs from the program library - which in those days was still a library in the old sense of the term.
– Gerald Weinberg, The Psychology of Computer Programming
How different things used to be. People measured programs in “thickness”, wrote special listing generators (in 300 pages of PDP-10 assembler - quite an effort), and dreaded that one day people may not sit down and read programs, not even their own.
I know it’s easy to feel that with our fancy IDEs we’ve advanced beyond such archaic ideas, but I believe that reading whole programs is a Good Thing and worth holding on to.
I’ve done a few experiments with readable programs over the years. The first one was about 10 years ago, a program listing generator called pbook.el. pbook itself actually sucks - it’s way too much code and way too much commentary - but regtest.erl is one reasonable example use from my professional life. I wrote another listing generator called elit.el
that was intended to mimic Steele’s style with RABBIT but this program sucked too for the same reasons. I’m not sure that listing generators are really needed, at least for short programs like bets.py. Early versions of SLIME were quietly pbook-formatted and I used to read them through without mentioning it to anybody.
So why blog about this now?
I’m working on a new project called Snabb Switch. I want the Snabb Switch code to be really good, so I’m very tempted to make it readable. I’m starting to think about what could be a practical tool for generating a program listing roughly the size of a small book, with chapters like “Intel NIC device driver”, “OpenFlow forwarding engine”, and so on.
The idea I’m playing with at the moment is to have a ‘make’ target to publish the Snabb Switch on Leanpub. This way I can calmly read through my source code with a red pen on my train rides between Zurich and the Alps. I expect that this would increase the quality of my source code overall and be well worth the effort, quite independent of whether other people decide to read the program too.
I’ve at least made one pleasing discovery: thanks to the beauty of Markdown I’m now able to write a version of pbook that is down from 241 lines to a mere 43 characters: sed -E -e 's/^/ /g' -e 's/^ --- ?//g'
. (See my Gist for a few more details.)
So that is my brain dump for today. Are you also interested in readable programs? Feel free to strike up a conversation with me on luke@snabb.co
.
This post is a tribute to the Emerging Languages conference that I just missed. I’ve started a new project called Snabb Switch and this is a braindump on how I’m choosing which programming language to use.
The Snabb Switch project is a low-level networking stack with emphasis on hypervisors. The #1 technical requirement is that you’re receiving a packet every microsecond or so and you have to do something clever with it and ship it back out quickly. I have a background of real programming in a bunch of languages: C, Erlang, Common Lisp, Scheme, Forth, Smalltalk, Emacs Lisp, Java, and the usual Unix suspects. I have the engineering philosophy of a firmware developer: I appreciate minimalism and I don’t want to depend on software that I’m not willing to understand and debug (less is more). I want to faithfully follow my own engineering values and simultaneously make the project accessible to other interested people.
Here are the languages on my radar for this project and a raw dump of my thoughts on them:
C. In my mind some code just is morally C code, no matter what language it’s written in, and other code morally is not C. I like to use C for code that’s morally C, like device drivers and packet shuffling, but I don’t want to use C for stuff that’s morally high-level, like configuration and status reporting and scripting. So I’m only happy to use C for problems that are 100% low-level or otherwise in a cocktail with a higher-level language.
C++. I am impressed when I see people use C++ to overcome the limitations of C and find clever ways to write whole applications. They win the ICFP programming contest pretty often and that impresses me. My gut feeling though is that this works well in tight organisations like Google but that C++ is a barrier to entry in more loosely coupled open source projects. The language also has a bit of a corporate feel to me – I’ve never picked up a C++ book just for fun. The C hacker in me is jealous the good collection of basic data structures though.
Objective-C. This is another chance to keep what’s good about C and overcome the limitations. I don’t want to commit to OSX as the target platform though – I’m focused on Linux and open to Windows – and I’m not confident that it’s portable in practice.
Erlang. I have done a lot of Erlang programming over the years and I know that it’s really excellent for a wide range of problems. I don’t want to use it for this project though. Erlang isn’t the right tool for writing morally-C code, and I haven’t really enjoyed using the FFI (linked-in drivers) in the past. I also feel that the Erlang runtime system is a really scary program these days – I don’t want to have to chase through it with gdb when hunting really tough bugs.
SBCL. Common Lisp is an extremely interesting option and one that I’ve explored in depth previously. You can write morally-C code in Common Lisp, but I feel like you have to work really hard when you do this, and I prefer to do my bit-bashing in C. CFFI is not quite convenient enough to make this fun for me in day to day usage. The SBCL runtime system is also of a similar complexity level to Erlang and I’ve found that it interacts badly with tools like strace and gdb. I don’t want to be chasing weird bugs involving e.g. signal handlers waking up when I write to memory that the GC has write-protected etc.
Embedded Common Lisp (ECL). Is this a Lisp with a dream-like FFI and minimalist runtime system? I’m not sure, it sounds a bit too good to be true, and I haven’t investigated because I don’t have a specific agenda to use Lisp. I would love to hear from people who have experience building systems comparable to Snabb Switch in ECL.
Openfirmware. Mitch Bradley’s Openfirmware Forth system is a masterpiece and in many ways very well suited to my project. Compact size, good mix of high-level (bytecode) and low-level (inline assembler), minimal and predictable runtime system, suitable for writing morally-C code. I reckon that Openfirmware based network equipment would work extremely well, but I also reckon that very few people could build such systems in a realistic amount of time. Forth is also naturally a barrier to adoption: it’s tricky to get the hang of and the study of Forth feels more like advanced computer science than basic engineering. So rather than use Forth I will seek out other tools that are acceptable to my inner Forth programmer.
LuaJIT. Wow! Lots of positive points. The whole implementation is small and simple. The runtime system is minimal and leaves me space to make important decisions like whether to use threads and how. The FFI is the best I’ve ever seen – it makes it easy to talk directly to hardware via shared memory and easy to call into C libraries. I can use a lovely workflow where morally-C code is initially written in LuaJIT and be selectively rewritten in C over time. The JIT is creating its own machine code in Forth-hacker real programmer style – I don’t understand it but I want to, and its small implementation makes this seem realistic. And there’s a reasonable story on soft realtime.
So: I’ve started off using LuaJIT and C. Initially the code is mostly LuaJIT but I do feel like I’m comfortably able to move the LuaJIT-C border around to see where it feels the best. This feels comfortable for the parts of my soul that love C, Forth, Scheme, and Emacs Lisp. The bits of my soul that love Erlang, Common Lisp, and Smalltalk will have to be patient :-).
I will blog about how it goes over time. If you want to discuss this stuff in more depth feel free to email me on luke@snabb.co.
]]>The Nehalem architecture looks like this:
So the main potential bottlenecks are:
Intel’s Performance Counter Monitor can tell you the utilization of these links. It’s like top
for memory bandwidth, inter-processor data shuffling, PCIe load, etc. I used this tool for verifying that PCs can push 40Gbps of full-duplex ethernet traffic without any hassle. Intel make some really fine tech.
Give it a try! It’s really easy to install.
Here’s how it looks:
EXEC : instructions per nominal CPU cycle IPC : instructions per CPU cycle FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost) AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost) L3MISS: L3 cache misses L2MISS: L2 cache misses (including other core's L2 cache *hits*) L3HIT : L3 cache hit ratio (0.00-1.00) L2HIT : L2 cache hit ratio (0.00-1.00) L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00) READ : bytes read from memory controller (in GBytes) WRITE : bytes written to memory controller (in GBytes) Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE 0 0 0.59 0.60 0.98 1.00 1743 K 3978 K 0.56 0.62 0.15 0.07 N/A N/A 1 0 0.61 0.62 0.97 1.00 2595 K 3356 K 0.23 0.64 0.23 0.02 N/A N/A 2 0 0.49 0.59 0.83 1.00 2205 K 3198 K 0.31 0.60 0.22 0.03 N/A N/A 3 0 0.06 0.32 0.18 1.00 715 K 921 K 0.22 0.35 0.34 0.02 N/A N/A ------------------------------------------------------------------------------------------------------------ TOTAL * 0.43 0.59 0.74 1.00 7259 K 11 M 0.37 0.61 0.21 0.04 6.51 2.85 Instructions retired: 3707 M ; Active cycles: 6317 M ; Time (TSC): 2134 Mticks ; C0 (active,non-halted) core residency: 73.99 % PHYSICAL CORE IPC : 0.59 => corresponds to 14.67 % utilization for cores in active state Instructions per nominal CPU cycle: 0.43 => corresponds to 10.86 % core utilization over time interval ----------------------------------------------------------------------------------------------]]>
There are a lot of communication mechanisms in widespread use: C APIs, system calls, RESTful services, JSON-RPC, AMQP, and plenty more besides. Some mechanisms survive, a lot influence new successors, and quite a lot of them just fall out of fashion and turn to dust. Shared memory is one with a timeless quality. Year after year it’s the glue that holds our computers together.
Shared memory is a simple mechanism. Many different programs are concurrently accessing the physical RAM chips in the ocmputer. The programs are written in completely different programming languages, they are executing on different microchips, and some of them are implemented as hardware rather than software. The programs all share a common abstraction: LOAD and STORE operations towards a big array of machine words. The rest is a matter of conventions, design patterns, and data structures. That’s all we need!
I really enjoy the versatility. How sweet it is to be able to drive hardware memory controller so directly.
Just for fun, here’s a shared memory interface I’m playing with for ethernet networking. That is: to use shared memory to send and receive packets between a “host” and a “device”. The host and device can be written in any programming language, or indeed in hardware, and this is roughly how hardware packet I/O interfaces really look.
Here’s an example memory layout. I sketched it using C syntax and defined a packet ring buffer for transmision in each direction.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Here’s a snippet using the data structure from C:
1 2 3 |
|
and here’s a snippet from LuaJIT:
1 2 3 |
|
and with any luck I’ll find an excuse to see how it looks in a hardware description language like Verilog too.
That’s all. I really appreciate shared memory. Hardware people use it all the time. It’s not the most high level communication protocol around, but it is simple and accessible, and it can be fun for software people too. I think so, anyway :-)
]]>I’ve made a homepage lukego.com now to keep track of my comings and goings and my old blog was lukego.livejournal.com.
]]>