Luke Gorrie's blog

21st Century Network Hacking

Readable Programs

I love readable programs. I mean: programs that you can print out, take to the park, and read from top to bottom. The reason I love readable programs is that somebody has made the effort to make them readable, perhaps simply by priting them out and running through them with a red pen a few times, and I believe this simple process to have a magical effect on a program’s clarity.

I believe that making programs readable is one of the best and easiest ways to improve them.

I gather that it used to be a common practice to print program listings and read them. I hear about it in anecdotes from programmers I respect, and I also see that many older programs appear to have been written with printing and readnig in mind: they contain pagebreak characters; they include their user-documentation in comments; and they are broken into logical sections of a couple of pages or so.

Let me share some quotations that have stuck in my memory:

The mask layout program by BillCroft at Purdue EE department - This is a truly awesome C program that could do VLSI scale designs on a PDP-11. The implementation included the command processing, high-resolution graphics, and custom database. Amazingly the program was only about half an inch thick and could be read in an afternoon. (Contrast this to my own companies’ graphics drivers for the same device which ran ten times this for the drivers alone.)

Ward Cunningham, WardsWiki

I was the one who decided to rewrite the [program listing generator] from scratch as a standalone program, partly because I wanted to add substantial new facilities, such as the ability to list many files at once and provide inter-file cross-references.

Guy Steele, ll1-discuss

Programming is, among other things, a kind of writing. One way to learn writing is to write, but in all other forms of writing, one also reads. We read examples - both good and bad - to facilitate learning. But how many programmers learn to write programs by reading programs? A few, but not many. And with the advent of terminals, things are getting worse, for the programmer may not even see his own program in a form suitable for reading. In the old days … programmers would while away the time by reading each others’ programs. Some even went so far as to read programs from the program library - which in those days was still a library in the old sense of the term.

Gerald Weinberg, The Psychology of Computer Programming

How different things used to be. People measured programs in “thickness”, wrote special listing generators (in 300 pages of PDP-10 assembler - quite an effort), and dreaded that one day people may not sit down and read programs, not even their own.

I know it’s easy to feel that with our fancy IDEs we’ve advanced beyond such archaic ideas, but I believe that reading whole programs is a Good Thing and worth holding on to.

I’ve done a few experiments with readable programs over the years. The first one was about 10 years ago, a program listing generator called pbook.el. pbook itself actually sucks - it’s way too much code and way too much commentary - but regtest.erl is one reasonable example use from my professional life. I wrote another listing generator called elit.el that was intended to mimic Steele’s style with RABBIT but this program sucked too for the same reasons. I’m not sure that listing generators are really needed, at least for short programs like bets.py. Early versions of SLIME were quietly pbook-formatted and I used to read them through without mentioning it to anybody.

So why blog about this now?

I’m working on a new project called Snabb Switch. I want the Snabb Switch code to be really good, so I’m very tempted to make it readable. I’m starting to think about what could be a practical tool for generating a program listing roughly the size of a small book, with chapters like “Intel NIC device driver”, “OpenFlow forwarding engine”, and so on.

The idea I’m playing with at the moment is to have a ‘make’ target to publish the Snabb Switch on Leanpub. This way I can calmly read through my source code with a red pen on my train rides between Zurich and the Alps. I expect that this would increase the quality of my source code overall and be well worth the effort, quite independent of whether other people decide to read the program too.

I’ve at least made one pleasing discovery: thanks to the beauty of Markdown I’m now able to write a version of pbook that is down from 241 lines to a mere 43 characters: sed -E -e 's/^/ /g' -e 's/^ --- ?//g'. (See my Gist for a few more details.)

So that is my brain dump for today. Are you also interested in readable programs? Feel free to strike up a conversation with me on luke@snabb.co.

Luke’s Highly Opinionated Programming Language Roundup 2012

Hoi Zäme!

This post is a tribute to the Emerging Languages conference that I just missed. I’ve started a new project called Snabb Switch and this is a braindump on how I’m choosing which programming language to use.

The Snabb Switch project is a low-level networking stack with emphasis on hypervisors. The #1 technical requirement is that you’re receiving a packet every microsecond or so and you have to do something clever with it and ship it back out quickly. I have a background of real programming in a bunch of languages: C, Erlang, Common Lisp, Scheme, Forth, Smalltalk, Emacs Lisp, Java, and the usual Unix suspects. I have the engineering philosophy of a firmware developer: I appreciate minimalism and I don’t want to depend on software that I’m not willing to understand and debug (less is more). I want to faithfully follow my own engineering values and simultaneously make the project accessible to other interested people.

Here are the languages on my radar for this project and a raw dump of my thoughts on them:

C. In my mind some code just is morally C code, no matter what language it’s written in, and other code morally is not C. I like to use C for code that’s morally C, like device drivers and packet shuffling, but I don’t want to use C for stuff that’s morally high-level, like configuration and status reporting and scripting. So I’m only happy to use C for problems that are 100% low-level or otherwise in a cocktail with a higher-level language.

C++. I am impressed when I see people use C++ to overcome the limitations of C and find clever ways to write whole applications. They win the ICFP programming contest pretty often and that impresses me. My gut feeling though is that this works well in tight organisations like Google but that C++ is a barrier to entry in more loosely coupled open source projects. The language also has a bit of a corporate feel to me – I’ve never picked up a C++ book just for fun. The C hacker in me is jealous the good collection of basic data structures though.

Objective-C. This is another chance to keep what’s good about C and overcome the limitations. I don’t want to commit to OSX as the target platform though – I’m focused on Linux and open to Windows – and I’m not confident that it’s portable in practice.

Erlang. I have done a lot of Erlang programming over the years and I know that it’s really excellent for a wide range of problems. I don’t want to use it for this project though. Erlang isn’t the right tool for writing morally-C code, and I haven’t really enjoyed using the FFI (linked-in drivers) in the past. I also feel that the Erlang runtime system is a really scary program these days – I don’t want to have to chase through it with gdb when hunting really tough bugs.

SBCL. Common Lisp is an extremely interesting option and one that I’ve explored in depth previously. You can write morally-C code in Common Lisp, but I feel like you have to work really hard when you do this, and I prefer to do my bit-bashing in C. CFFI is not quite convenient enough to make this fun for me in day to day usage. The SBCL runtime system is also of a similar complexity level to Erlang and I’ve found that it interacts badly with tools like strace and gdb. I don’t want to be chasing weird bugs involving e.g. signal handlers waking up when I write to memory that the GC has write-protected etc.

Embedded Common Lisp (ECL). Is this a Lisp with a dream-like FFI and minimalist runtime system? I’m not sure, it sounds a bit too good to be true, and I haven’t investigated because I don’t have a specific agenda to use Lisp. I would love to hear from people who have experience building systems comparable to Snabb Switch in ECL.

Openfirmware. Mitch Bradley’s Openfirmware Forth system is a masterpiece and in many ways very well suited to my project. Compact size, good mix of high-level (bytecode) and low-level (inline assembler), minimal and predictable runtime system, suitable for writing morally-C code. I reckon that Openfirmware based network equipment would work extremely well, but I also reckon that very few people could build such systems in a realistic amount of time. Forth is also naturally a barrier to adoption: it’s tricky to get the hang of and the study of Forth feels more like advanced computer science than basic engineering. So rather than use Forth I will seek out other tools that are acceptable to my inner Forth programmer.

LuaJIT. Wow! Lots of positive points. The whole implementation is small and simple. The runtime system is minimal and leaves me space to make important decisions like whether to use threads and how. The FFI is the best I’ve ever seen – it makes it easy to talk directly to hardware via shared memory and easy to call into C libraries. I can use a lovely workflow where morally-C code is initially written in LuaJIT and be selectively rewritten in C over time. The JIT is creating its own machine code in Forth-hacker real programmer style – I don’t understand it but I want to, and its small implementation makes this seem realistic. And there’s a reasonable story on soft realtime.

So: I’ve started off using LuaJIT and C. Initially the code is mostly LuaJIT but I do feel like I’m comfortably able to move the LuaJIT-C border around to see where it feels the best. This feels comfortable for the parts of my soul that love C, Forth, Scheme, and Emacs Lisp. The bits of my soul that love Erlang, Common Lisp, and Smalltalk will have to be patient :-).

I will blog about how it goes over time. If you want to discuss this stuff in more depth feel free to email me on luke@snabb.co.

Intel’s Performance Counter Monitor

Wanna find the limiting factor in I/O performance on an Intel Nehalem server app? Here’s one idea.

The Nehalem architecture looks like this:
Nehalem Architecture

So the main potential bottlenecks are:

  1. RAM<->CPU @ approx 256Gbps to each CPU’s RAM bank. (Too many memory copies?)
  2. CPU<->CPU @ approx 200Gbps for sharing RAM over QPI. (NUMA all mucked up?)
  3. PCIe @ approx 72Gbps. (Out of ports? Crappy motherboard?)
  4. Storage disk/SSD. (Waiting all the time?)

Intel’s Performance Counter Monitor can tell you the utilization of these links. It’s like top for memory bandwidth, inter-processor data shuffling, PCIe load, etc. I used this tool for verifying that PCs can push 40Gbps of full-duplex ethernet traffic without any hassle. Intel make some really fine tech.

Give it a try! It’s really easy to install.

Here’s how it looks:

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses 
 L2MISS: L2 cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK  | READ  | WRITE 

   0    0     0.59   0.60   0.98    1.00    1743 K   3978 K    0.56    0.62    0.15    0.07     N/A     N/A
   1    0     0.61   0.62   0.97    1.00    2595 K   3356 K    0.23    0.64    0.23    0.02     N/A     N/A
   2    0     0.49   0.59   0.83    1.00    2205 K   3198 K    0.31    0.60    0.22    0.03     N/A     N/A
   3    0     0.06   0.32   0.18    1.00     715 K    921 K    0.22    0.35    0.34    0.02     N/A     N/A
------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.43   0.59   0.74    1.00    7259 K     11 M    0.37    0.61    0.21    0.04    6.51    2.85

 Instructions retired: 3707 M ; Active cycles: 6317 M ; Time (TSC): 2134 Mticks ; C0 (active,non-halted) core residency: 73.99 %

 PHYSICAL CORE IPC                 : 0.59 => corresponds to 14.67 % utilization for cores in active state
 Instructions per nominal CPU cycle: 0.43 => corresponds to 10.86 % core utilization over time interval
----------------------------------------------------------------------------------------------

Shared Memory: A Moment of Appreciation

Today I’m feeling appreciative of shared memory.

There are a lot of communication mechanisms in widespread use: C APIs, system calls, RESTful services, JSON-RPC, AMQP, and plenty more besides. Some mechanisms survive, a lot influence new successors, and quite a lot of them just fall out of fashion and turn to dust. Shared memory is one with a timeless quality. Year after year it’s the glue that holds our computers together.

Shared memory is a simple mechanism. Many different programs are concurrently accessing the physical RAM chips in the ocmputer. The programs are written in completely different programming languages, they are executing on different microchips, and some of them are implemented as hardware rather than software. The programs all share a common abstraction: LOAD and STORE operations towards a big array of machine words. The rest is a matter of conventions, design patterns, and data structures. That’s all we need!

I really enjoy the versatility. How sweet it is to be able to drive hardware memory controller so directly.

Playing around

Just for fun, here’s a shared memory interface I’m playing with for ethernet networking. That is: to use shared memory to send and receive packets between a “host” and a “device”. The host and device can be written in any programming language, or indeed in hardware, and this is roughly how hardware packet I/O interfaces really look.

Here’s an example memory layout. I sketched it using C syntax and defined a packet ring buffer for transmision in each direction.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
enum {
    RING_SIZE = 1024,
    PACKET_SIZE = 1504,
};

// One ethernet packet.
struct packet
{
    unsigned int length;         // length of packet
    char data[PACKET_SIZE];      // on-the-wire packet data
};

// Ring buffer containing ethernet packets.
struct ring
{
    unsigned int head;           // position of first unread packet
    unsigned int tail;           // position of last unread packet
    unsigned int overflow_count; // number of packets dropped due to overflow
    struct packet packets[RING_SIZE];
};

// Ethernet device with two ring buffers for transmit and receive.
struct dev
{
    unsigned int magic;          // magic number identifying the structure
    unsigned int version;        // version of memory layout
    struct ring dev2host;        // ring of packets from host to device
    struct ring host2dev;        // ring of packets from device to host
};

Here’s a snippet using the data structure from C:

1
2
3
bool is_empty(struct ring ring) {
    return ring->head == ring->tail;
}

and here’s a snippet from LuaJIT:

1
2
3
function is_full (ring)
   return ring.head == (ring.tail + 1) % C.RING_SIZE
end

and with any luck I’ll find an excuse to see how it looks in a hardware description language like Verilog too.

That’s all. I really appreciate shared memory. Hardware people use it all the time. It’s not the most high level communication protocol around, but it is simple and accessible, and it can be fun for software people too. I think so, anyway :-)

New Blog!

New blog! This time I’m taking Octopress for a spin. I really like the idea of markdown for text and static generation of content. Reminds me a bit of using Latte in the old days. Latte became a dead-end when bitrot set in on the toolchain for generating pages. Let’s see how this goes.

I’ve made a homepage lukego.com now to keep track of my comings and goings and my old blog was lukego.livejournal.com.