Sunday, January 31, 2010

Community helping the community, ala YouTube

The video generation is taking hold in the OpenSolaris community. Recently, Michelle Knight, a self-described general lunatic, asked for help on the OpenSolaris ZFS forum. But quite unlike most folks who get help and quietly wander away, or (hopefully) post a summary for posterity, she made a video describing what she learned and posted to YouTube. Very cool. Well done, Michelle!

Wednesday, January 27, 2010

Magic Mouse and Ring Finger Solution

I use an Apple Magic Mouse and really do love it. I use Adobe InDesign CS4 for writing technical documents. I also have a ring finger. Each of these things works well by itself. Together, they don't work well. This is a typical systems engineering problem. Each part works as designed, but together they don't work well.
Now you are probably wondering why these three things don't work together - millions of people use mice, thousands of writers use InDesign, and almost all of the people on Earth have ring fingers. Let me explain.

  1. The Magic Mouse is so very, very cool because the entire surface is touch sensitive. It is very easy to use and allows you do things you could never do before with a mouse. For instance, many mice have had a little scroll wheel and OSes are designed to use the scroll wheel movement to scroll up and down inside a window. Some mice have little trackballs that allow you to scroll left or right, too. The Magic Mouse is almost like giving your fingers a trackpad on top of the mouse. Implementing a multiple button click function is simply a matter of the programming that determines where your finger is when you press. Very cool. Very habit forming. In just a few short weeks, my hand is already forgetting how to use older mice.
  2. Adobe InDesign is a very powerful publishing product. I've been using FrameMaker since 1987 and find that InDesign has many of the features I've use in FrameMaker, but is even more powerful and flexible. One of the interesting concepts in InDesign is the pasteboard. Your document sits atop the pasteboard. If you want to move a frame, text, image, or other object out of your document quickly, but without deleting the object, then you can just slide it over to the pasteboard beside the page. Only the objects on the page are printed or exported to PDF, so you can use the pasteboard to keep your miscellaneous collection of stuff very easily. The pasteboard is larger than your page, and by default adds about 8 inches to each side of your document. This means that your pasteboard for a letter sized document is around 24 inches wide. Since my screen is not 24 inches wide (is Santa listening? I'll be a good boy) the windows I use have horizontal scroll bars. For the most common case, the page is in the center of the scroll bar. I've spent a few hours trying to figure out how to make the pasteboard thinner, but none of the tricks work.
  3. My ring finger has a tendency to rest on the right side of the mouse while my index and middle fingers wander about the mouse top and click.
OK, so now you should be able to recognize the problem. My ring finger is interpreted by the Magic Mouse to do a horizontal scroll and InDesign extends the scrolling area by 60%, most of which is area I rarely use. In other words, while I'm working away, I get suddenly scrolled off into the blank area of the pasteboard. Since the document is in the middle, I have to scroll back to the center, which is harder to do than scrolling full left or full right.
The solution I've found is to put a small bit of painter's tape over the area where my ring finger rests. I could have used duct tape, and that would make a good joke, but I prefer the painters tape for now.

So far this is working well. A programmatic way to build dead spots on the Magic Mouse would be a useful feature. InDesign could allow me to control the horizontal size of the pasteboard. All of these programming changes are perhaps not difficult, but will also not be solved soon. For now I can be highly productive without having to horizontally scroll back to center on InDesign.
Now, about those deadlines...

Integrated Systems Engineering Redux

Today, Oracle is presenting a webcast describing their strategy for the company going forward after the Sun acquisition. In the first 20 minutes there was much discussion about delivering integrated systems: applications + database + OS + hardware. This is a tremendous value proposition. It is such a tremendous value proposition that it could have been taken from the slides we put together 8 years ago in Sun's Integrated Systems Engineering group.
We had difficulties with the business of delivering such integrated solutions. Sure, there were a few technical difficulties, but working together with the different engineering groups at Sun and Oracle, we were able to deliver a good technical solution. However, the business challenges of working across different product groups and companies were insurmountable at the time. In the end, the Integrated Systems Engineering group was disbanded and the products were EOLed.
In my position as Chief Architect for the Integrated Systems Engineering group, I had the pleasure of working with many talented engineers and product marketing teams. But the experience taught me that very good technical solutions may not be successful because of the rest of the business activities needed to ensure the right products offering the right value are delivered at the right time to the right market. And those products include much more than a what a systems engineering team can integrate in a lab. This is why I entered the EMBA program at USC's Marshall School of Business. I already knew how to integrate complex systems and make them simple to install and manage. But I did not know how to take such a product and make it successful in the market. I'm a lot smarter now.
I wish Oracle well in their future endeavors. The value proposition is good. The need exists. The challenges are difficult. If they can overcome the non-technical barriers, the future looks bright.

Monday, January 25, 2010

National recognition for San Diego County CERT

The AMGEN Tour of California bicycle race rode to the top of Palomar Mountain last year. I mentioned it in my blog prior to the race. This month I am pleased to announce that the CERT National Newsletter (Volume 2, issue 3) features a story about the preparation and nearly flawless execution of the event. This was truly a case where dozens of volunteers came together, at short notice, to pull off a significant event involving thousands of people and a nationwide TV audience.
Bill Leininger and the crew from Palomar Mountain Volunteer Fire Department CERT demonstrated superb leadership and I am proud to have been able to participate. I'd also like to thank all of the volunteers and groups who came together to make this event a success.
On page 2, you can yours truly (in the red shirt, strategically located near the donuts) during the pre-event briefing.

Friday, January 15, 2010

Looking at I/O Performance with Bubbles

I am helping a client work through some performance problems and thought I might share a view with you. The data was collected for 57 seconds during a production run. The problem we are chasing is the usual performance problem: latency. In some cases the latency is close to 100ms, which would make everyone except a floppy disk user unhappy. The view of the data is intended to shed some light on where problems might exist that we need to further explore. Using summary data from tools like iostat, vmstat, mpstat, prstat, or top won't show you anything like this.

In the bubble chart, the Y axis is the size of the I/Os. Along the X axis, reads are on the left and writes are on the right. The size of the bubbles is the latency in microseconds. Big bubbles mean big performance problems. Press the play button to see the changes over time.

There are two ZFS transaction group (txg) commits: one at 8:49:14 and another at 8:49:44. ZFS will, by default depending on the version, commit the txg every 30 seconds. When the txg commits, you will see a flurry of relatively small (8 KB) write activity. Though this may look really terrible (and it is) remember that txg commits are asynchronous, so you will rarely feel them. But in this sample, some of the txg I/Os take more than 50 milliseconds to complete. In the entire sample, the worst latency was more than 370 milliseconds (more than 1/3 of a second). For a slow HDD, 50 milliseconds might not be so bad. But in this case, the target is an expensive RAID array. More work needed to get to the bottom of this mystery...

If you would like to see this sort of analysis for your system, contact me and we can discuss an engagement.

Tuesday, January 12, 2010

St. Mac magazine Premier Edition

I was doing the holiday cleaning recently and came across a blast from the past. Long ago, when I worked for NASA at the Kennedy Space Center, we were doing some work with an Apple Lisa and an Apple ///. Naturally, when the Macintosh first appeared, it created quite a stir. Along with the rush of announcements and press was the creation of a new magazine, St. Mac. I dusted off a copy of the premier edition. Apparently the magazine didn't last long, only 7 issues, but it brought back a lot of memories. My, how computing has changed since then... why kids today can't even remember when pixels weren't square and how the Mac saved mankind from a world of ellipses.

The cover shows the second generation Lisa. You can tell because the first generation used twiggy drives and the second used the new removable storage rage, 3.5" floppy disks.

The table of contents includes a picture of a young Bill Atkinson. Bill made square pixels cool.

Saturday, January 9, 2010

I/O Reduction and the ZIL

I came across an interesting microbenchmark this week. It shows that some workloads can show confusing results, or head fakes, can lead to difficulty in understanding benchmark results. In this case, a method we use for finding the performance envelope for ZFS is not effective.

Before I dive into the microbenchmark, a few words about the ZFS Intent Log (ZIL). ZFS is a transactional file system, which means that it collects I/O into a transaction group (txg) and commits that txg to persistent storage. In later ZFS implementations, that txg commit occurs every 30 seconds. However, if an application needs to ensure that an I/O is written to persistent storage immediately, often called synchronous writes (though that is arguably not the best descriptive term), then waiting for up to 30 seconds is not an option. This is where the ZIL enters the picture. In the synchronous write case, ZFS will write the record to the ZIL and later commit the record with the txg. This ensures the synchronous write agreement between the application and ZFS is honored -- a good thing. Neil Perrin offers a more detailed description in his famous lumberjack blog posting.

Synchronous writes are the bane of high performance. Really. We see this every day. It causes performance guys to gnash their teeth and cuss. When a microbenchmark wanders along and does a lot of synchronous writes, complaints about how " sucks" and "I can't believe those file system developers could be so insensitive" come pouring forth.

To determine the performance envelope of a benchmark, it is relatively easy to disable the ZIL. This is neither a safe nor recommended option for production systems or people who like their data. But for benchmarking, it allows a performance engineer to quickly determine the best possible performance for the given system configuration. The ZIL is then re-enabled and the work can concentrate on how to approach that performance goal. Tools like zilstat are designed to help with this endeavor, and can save you a lot of time when you suspect synchronous write performance might be an issue.

But disabling the ZIL can also hide important behavior. That is why this microbenchmark could be a poster child for benchmarking that doesn't do what you expect. Here it is:

while true; do
echo "blah" > outputfile

When run on an Solaris NFS client with a Solaris NFS server using default NFS settings, this will cause the following to occur:
  1. outputfile is LOOKUPed
  2. outputfile is OPENed
  3. ACCESS to outputfile is checked
  4. The data is written to the file with WRITE
  5. The data is COMMITted
  6. outputfile is CLOSEd
This will also, by default, cause the file to be synchronously written, the so-called "sync-on-close" operation.

Argv! This simple microbenchmark actually makes lot of synchronous writes to the file system. zilstat will happily show that the ZIL is working hard when running this microbenchmark. If you run this, then you can experiment with various pool or separate (ZIL) log configurations to your heart's content.

However, if you disable the ZIL, then the number of I/O operations is reduced to just a handfull, every 30 seconds. Why? Because ZFS is clever enough to recognize that the same file is being overlaid and is only concerned with physically commiting the last one in the transaction group. In other words, the amount of I/O traffic to the pool is dramatically reduced. When this happens, you are no longer measuring the affect of ZIL I/O, you are also measuring the main pool I/O. The results look something like this:
  1. ZIL enabled, no separate log = 100 iterations/second
  2. ZIL enabled, separate log on a fast SSD = 1,000 iterations/second
  3. ZIL disabled = 10,000 iterations/second
In other words the affect of eliminating the pool I/O in addition to the ZIL I/O made the system faster! Hurray! But wait just a dog-gone second. That means that the benchmark is basically useless -- it does almost zero physical I/O when the ZIL is disabled. This is kinda like redirecting all of the data to /dev/null -- a fun trick to amuse your friends at parties, but otherwise completely useless.

The moral of this tale is: beware of microbenchmarks and how they can confuse your understanding of the real system behavior.

P.S. Don't disable the ZIL.

P.P.S. I really mean it, don't disable the ZIL. Seriously. I might cut you some slack for benchmark purposes, but other than that, don't disable the ZIL. Period. End of discussion.

Wednesday, January 6, 2010

My First Macadamia

A few years ago I planted some macadamia trees in the orchard. It takes a few years before they begin to bear fruit. 2009 was the first year that we had blooms. I was very excited!
This small cluster of blooms produced 6 nuts. Of these, all disappeared over the summer (I blame the ravens) except one. Last week, I harvested the one, special nut. I brought the nut inside to take pictures, but the new kitten, Jellyfish, was convinced that it was her personal playtoy.

Jellyfish attacks the unsuspecting nut!

I had to act fast. Armed with a bit of foam pipe insulation and a hammer, I whacked the nut and extracted the precious seed.

Yummy! And yes, that is a cow floating above my head. This is a ranch, after all.
Nothin' left but the shell. Next year I'm expecting more blooms and nuts, wish me luck.

Tuesday, January 5, 2010

ZFS Tutorial at the USENIX LISA09 slides posted

I have posted my slides for the ZFS Tutorial at the USENIX LISA09 conference on I apologize for the delay, I've been fighting the beast trying to get the PDF uploaded. I finally gave up and uploaded the keynote presentation. As such, I've also disabled the file download. If you want a copy of the PDF, drop me an e-mail and I'll send it to you.

These slides are the full deck. During the presentation, I used a subset of the slides, but made the full deck available to the attendees. I hope you find these useful.

If you'd like to attend one of my ZFS tutorials, then you can also drop me a line. I have several in the planning stage and hopefully we can schedule something nearby.