Mockingbird, Cappuccino, and what really matters.

I found an interesting critique via daring fireball regarding the latest Cappuccino application, Mockingbird. I’m really glad this has come up because I think the concerns are valid and I’m excited that we can start having a conversation about this. One comment in particular really stuck out for me:

If you load the app, you can see custom scrollbars and navigation, a complete lack of accessibility, non-native controls, and all those other things that cause geeks to hate Flash.

This is a classic programmer’s misunderstanding of a design problem. Listen carefully guys: pure native controls aren’t what matters. What users actually care about is broken and ugly controls. This should be incredibly obvious from the fact that many of the most popular applications on the Mac use custom “non native” controls, such as Tweetie, iTunes, Quicktime, every single Apple Pro App, and Acorn just to name a few. In fact, its almost par for the course now adays.

Now, I haven’t heard a single complaint that Tweetie for the Mac doesn’t have blue aqua scrollbars. Why? Because the Tweetie scrollbars work well in this setting and look good. The reason people hated Java’s UI’s wasn’t because they were “non native”, its because they were ridiculously horrendous and behaved poorly to boot. Instead of admitting that what was needed was good designers, programmers simply drew the lazy conclusion that every control had to be drawn by the system to give it some sort of magical properties. This is exactly the problem with Flash. What frustrates me about the scrollers in Balsamiq isn’t simply that they’re different, its that they don’t work with my scrollwheel mouse and look incredibly out of place.

In Cappuccino we’ve taken two important steps: First, we’ve relentlessly implemented all the “native” features of scrollers (and other controls of course) people have come to expect: from command+clicking in the track to respecting horizontal scroll to listening to arrow keys. Have we missed one? Perhaps, in which case you should file a bug. Or better yet, fork the project and ship your fix immediately to your users, something you can’t do with Flash or the built in controls in HTML. Secondly, we’ve hired a real design firm, Sofa, to make a UI that truly looks awesome on the web: Aristo. You should take a look at their Cappuccino application EnStore and try to argue that this thing doesn’t feel great:

The second assertion was the following:

Gruber’s definition of “true web app” and mine greatly differ. Clue: If it’s completely unusable on the iPhone Safari browser, it doesn’t matter if it’s built in JavaScript, Flash or Microsoft Visual Fortran 2012. It’s not a “true web app”

Well, for starters:

Mockingbird on the iPhone

Mockingbird on the iPhone

But let’s get to the real issue here, because this is once again a misunderstanding of design vs. programming. HTML, JS, and CSS do not magically create wonderful experiences on every platform they are run. As you can see from the above screenshot, they certainly have the nice side effect of working on said platforms, but if you’re expecting HTML to somehow handle the subtle and explicit differences between a handheld multitouch peripheral and a desktop application, well then you’re doing it wrong. These are completely different environments and they require completely different designs and often implementations. The reason mockingbird is “completely unusable” on the iPhone despite loading up fine is because it was designed for a large screen. Photoshop written with perfect semantic markup or however you want to define a “true web app” won’t work on a small screen either. Clearly though, the nice side effect of using Cappuccino is that you’ll at least be able to share common source code between both versions of these apps.

I think the fundamental conclusion here is that people get really hung up on the “web” part of web apps, when they should be focusing on the “app” part. At the end of the day, you are delivering your customer an experience. I believe that someday all apps will be web apps, and then this will become much clearer. At that point, what will matter in a mobile web app is the mobile part, and what will matter in the desktop web app is the desktop part.

On HTML 5 Drag and Drop

HTML 5 is shaping up to be quite an impressive step up from the capabilities web developers are currently constrained to. One of my favorite new features provided by the spec is support for native drag and drop. Cappuccino and many other JavaScript libraries have had drag and drop support for quite a while now, but with one important caveat: the drag operations were limited to within the browser window. This was not only visually displeasing, but prevented you from being able to share data in a user friendly way from one web app to another, or even to other desktop apps. HTML 5 aims to change all this by giving us access to the computer’s native drag system and clipboard. I took the last week to really familiarize myself with this API and its various implementations on current browsers so I could start adding support for it in Cappuccino. I feel that this gave me a pretty unique perspective on the current state of this feature which I’d like to share, mainly because I’ve had to make it work in a number of real (sometimes shipping) applications, as opposed to simplying creating small demos. The good news is that last night I was able to land my first commit which adds full HTML 5 drag and drop support for Safari and other WebKit-based browsers. Here is a short movie that shows this feature in action in our internal 280 Slides builds:

As you can see, this feature enables you to easily share data, whether it be images and shapes or full slides, from one presentation to another. What’s particularly cool about this is that you won’t have to change your existing code at all since Cappuccino simply detects when you are on a compliant browser and magically “upgrades” to native drag and drop. On older browsers, you will still get the old in-browser implementation. Ah, the beauty of abstraction.

This isn’t to say that working with this feature was all peaches and cream though. For starters, this feature is far from complete in any browser. I experienced a tremendous amount of bugs, crashes, and inconsistencies in all the browsers I tried. On the one hand, I got to play with a very exciting new toy, and on the other I was given a glimpse into the future of the bugs I would be dealing with for years to come (just when we thought the whole cross-browser thing was starting to become managable). This isn’t surprising of course, it is a very new addition and the spec isn’t even 100% complete yet. For this reason, I’ve decided to split this post up into two pieces. In the following I will be discussing what I believe to be actual and serious design flaws in the current API, as well as a few suggestions I have for how they might be remedied. I will also separately link to a page that has all the bugs and inconsistencies I discovered (as well as the associated tickets I filed on them), and workarounds when I could find them.

I believe the main “theme” of the problems I encountered was due to the fact that I am trying to build full-blown applications as opposed to dynamic web pages. This however is no excuse, as one of HTML 5’s supposed goals is to usher in an era of more web apps that are more competitive with desktop apps. This is precisely why Google is supporting it so heavily.

Lazy Data Loading

One of the key facilities of drag and drop is the ability to provide, and get, multiple representations of the same data. Different web pages, web apps, and desktop apps support different kinds of data, so it is up to your application to give them something they can work with. Take 280 Slides for example: When a user drags the slides out of slides navigator, he may be planning to drop it to any number of locations. If he is dragging it from one instance of 280 Slides to another, then we want to provide a serialized version of these slides so that they can be added to the other presentation. If however, he drags these slides into a program like Photoshop, then we would want to provide image data. If he were to drag them to his desktop, then perhaps we could provide a PDF version. He could even drag them to his text editor and expect the text contents of his slides to be pasted.

Multiple Data Types

Multiple Data Types

The way you do this currently is with the setData function, which allows you to specify different types of data:

document.addEventListener("dragstart", function(event)
{
   event.dataTransfer.setData("image/png", slides.imageRep());
   event.dataTransfer.setData("slides", slides.serializedRep());
   // etc.
}, false)

This is incredibly common on the desktop, and you’ve probably never noticed it precisely because it works so well: things seem to just do the right thing when you drag and drop them. However, an unfortunate side effect of this feature is that you end up doing a lot of extra unecessary work. The user only ever drops the item to one location, and so all the other formats you’ve created were wasted processing time. This is not a big deal for simple cases of drag and drop, but it becomes quite noticable in large applications like 280 Slides. In the example above, creating serialized and image representations of these slides can become quite slow depending on how many elements are in the individual slides and how many slides you are moving. Because of this you may experience a lag when you first drag the slides out. The worst part is, if all you intended to do was reposition the slides in the same presentation, then you didn’t need any of these formats!

This problem was solved in a very simple and intelligent way on the desktop a long time ago: simply delay supplying the actual data until the drop occurs. At the point of the drop, you actually know which of the 5 supplied types the user is interested in, so create it then. Not only does this save you from doing uncessary work, but generally users notice time spent processing after a drop a lot less (because there is no expected user feedback to stutter). I’ve thought a lot about a good way to allow the user to do this with the existing setData method , and I think it could be done by simply allowing developers to provide functions that could be called when the data is needed:

event.dataTransfer.setData("slides", function()
{
   return costlySerialization();
} );

Perhaps a more backwards compatible alternative would be:

event.dataTransfer.setData("slides", { toString: function()
{
   return costlySerialization();
} } );

Although I don’t really think this is necessary since this API is so new. Either way, this allows us to use the existing setData method, while not actually needing to calculate the string value until getData is actually called by the drop target.

Initiating Drags

Another major hurdle I encountered was in controling the way drags are actually started. Currently this is a delicate dance of preventDefaults and interactions between mousedown, mousemove, and dragstart, in combination with the draggable HTML attribute. The basic problem with this is that it leaves the decision to create a drag entirely to the the browser. Again, this is just fine for simple cases, but it really starts to break down when you are building full on applications in the browser. On the other hand, frameworks like Cocoa allow the developer to initiate the actual drag sequence. Lets look at why this is important with a simple example. It is quite common to want to start a drag event on the initial mouse down, instead of waiting for additional mouse move events. In these cases, it would be more confusing if the initial mouse down did nothing. This is currently impossible to achieve with the HTML 5 drag and drop APIs. In Cocoa, this would be quite simple, requiring the developer simply start the process in mouseDown: instead of mouseDragged:

- (void)mouseDown:(NSEvent *)anEvent
{
   [NSView dragImage:myImage /*...*/];
}

This is just a simple example of course. More complex widgets provide even more cases where drag and drop in the browser really works against you. Take tables in Mac OS X, which provide different behaviors depending on what direction the users drags in:

As you can see, when a user drags upwards in a table on Mac OS X, the selection of the table changes (in other words, no drag takes place). On the other hand, if the user drags left, right, or diagonally in any way, then he is allowed to move these files. This is very intuitive experience when you use it, and is absolutely trivial to implement in Cocoa:

- (void)mouseDragged:(NSEvent *)anEvent
{
   if (deltaX > 10)
       [NSView dragIamge:myImage /*...*/];
   else
       [self modifySelection];
}

However, this is again basically impossible with the current HTML 5 API, as you can never be a part of the decision as to whether an object is dragged or not. Once you get the drag event, it’s too late. You can imagine that this would become even more cumbersome in applications like Bespin that revolve less around specific tags and more around content that is drawn to a canvas elements. When a user drag in Bespin, they have to decide between any number of actions. I think a good solution to this would be to simply allow the developer to manually kick off a dragging event loop from either a mousedown or mousemove callback. Something like this:

document.addEventListener("mousedown", function(event)
{
   event.startDrag();
}, false);

document.addEventListener("mousemove", function(event)
{
   if (someCondition)
       event.startDrag();
}, false);

In both these cases, calling startDrag would result in no further mousemoves/mouseups being fired in this event loop, and instead would kick off the drag event loop with a “dragstart” event. A matching cancelDrag() could be provided as well. This would allow you to cancel a drag, but not any other specific behavior such as selection. Currently calling preventDefault cancels both drags and selection. This actually leads to a number of other confusing results. For example, if you place a textfield within a draggable element, it is essentially impossible for text selection to happen in that textfield, even if you set the textfield itself to not be draggable.

Drag Images

One of the nice parts about drag and drop is that you are allowed to set any arbitrary image or element as what is actually rendered during the drag process with the setDragImage method:

dragEvent.dataTransfer.setDragImage(aDOMElement, offsetX, offsetY);

However, on Firefox it is required that this element already be visible. Now, I wasn’t sure whether to list this as simply a bug in Firefox or an actual design flaw, but I chose to list it as a flaw because the documentation at mozilla.org would seem to suggest that they may consider this to be “correct behavior”. Safari does not have this restriction, and in fact Firefox even seems to make an exception for canvas elements. Firefox seems particularly strict about this requirement too, as I tried positioning an element offscreen in a negative position, setting its visibility to hidden, setting the display to none, and even placing the element in an offscreen iframe, anything to prevent having to actually flash the element in some random portion of the screen before dragging it. It seems to me that this method exists for the purpose of showing something different, and thus it’s a bit unreasonable to expect it to already be not only in the document, but visible as well. My request here is simple: that it should simply work the way it does in Safari.

Conclusion

Drag and drop is an incredibly important part of the way we interact with computers, which is why it is so crucial that we get it right from the beginning. I really hope my concerns are heard and that we can come up with some good solutions to the initial problems I faced with this young API, so that we can avoid the windows of incompatibility that plagued the last updates to HTML. In the meanwhile, I’ve filed a bunch of bugs and documented my current experiences here.

Building a Better JavaScript Profiler with WebKit

I had the pleasure of showing off some the cool new features we’ve been adding to the WebKit inspector at JSConf last week. It’s no secret that debugging basically sucks in JavaScript, and until recently, it was a little bit worse in Objective-J. Up until now we’ve focused mainly on adding stop gap measures to our own code, but recently we’ve decided to shift gears and attack the problem head on in the browsers themselves. This is why these past couple of weeks I’ve set aside the JavaScript code and instead focused on working with the great guys on the WebKit team on providing a solid debugging experience both in Objective-J and JavaScript in general. We first decided to focus on profiling, since this is an area of considerable interest for a framework. All the code I’ve committed is now available in the latest WebKit nightly, so if you want you can download it to follow along. I’ve also added to the end of this post links to both the WebKit commits we added, as well as the accompanying code we put in Cappuccino in an effort to show how to make the best use of these new features and encourage others to take a stab at adding some debugging features to WebKit. Surprisingly enough, the folks over at Joose wasted no time incorporating this into their own library, so I’ve included links to their additions as well.

Anonymous and Poorly Named Functions

Had you run a Cappuccino application through Firebug’s profiler back in September when we originally open sourced the framework, you would have probably seen something that looked like this:

Anonymous Functions in Firebug

Anonymous Functions in Firebug

Anyone who’s done a significant amount of profiling with Firebug has probably run into the dreaded question mark functions at some point or another, but as you can see from above, it used to be particularly egregious in Objective-J. The reason these question marks show up is because somewhere the script in question contains an anonymous function. Anonymous functions, or lambdas as they’re sometimes referred to, are functions that you can declare just about anywhere in your program and not bother naming. Take the following code for example:

document.addEventListener("click", function() { /*...*/ }, false);

Here we’re using an anonymous function to perform special behavior on a mouse click event, and when profiled, this function will show up as a question mark. The obvious workaround is to simply declare this function normally somewhere else in the code, but this isn’t always possible because you might need it inline in the code so as to form a closure. So instead the recommended solution today is to simply give it a name with the following syntax:

document.addEventListener("click", function clicked() { /*...*/ }, false);

And in this particular case, this will work quite well and allow this function to appear in profile as clicked. However, there are certain cases where this won’t work. Let’s look at a different snippet of code to see such a case:

function generator(/*Number*/ iterations)
{
    return function()
    {
        for (var i = 0; i < iterations; ++i)
            // do something
    }
}

Here we’ve created a function called generator that creates other functions when executed. As is, these functions will show up as question marks just as before, but this time we can’t simply name them inline because then all the generated functions would show up with the exact same name:

function generator(/*Number*/ iterations)
{
    return function iterate()
    {
        for (var i = 0; i < iterations; ++i)
            // do something
    }
}

Unfortunately, there is really very little we can do to remedy this situation short of using an eval statement, which would change the performance characteristics of this method so drastically that the entire exercise would become moot. It’s not just floating anonymous functions that suffer from poor naming though. Imagine that you have created the following prototypal classes and methods in your application or library:

MyClass.method = function()
{
    // MyClass class method
}

MyClass.prototype.method = function()
{
    // MyClass instance method
}

MyOtherClass.prototype.method = function()
{
    // MyOtherClass instance method
}

Both in Firebug and Safari, this code will generate a largely useless profile:

Profiling object methods in Firebug or Safari

Profiling object methods in Firebug or Safari

This profile is almost as ambiguous as when it was all question marks. We can’t tell whether MyClass.myMethod, MyClass.prototype.myMethod, or MyOtherClass.prototype.myMethod is the bottleneck here. If we aren’t generating these methods in any special way, we could try to name them inline, but we’d have to mangle the names considerably to pack in all the information we need:

MyClass.method = function $_MyClass__class__method()
{
    // MyClass class method
}

MyClass.prototype.method = function $_MyClass__method()
{
    // MyClass instance method
}

MyOtherClass.prototype.method = function $_MyOtherClass__method()
{
    // MyOtherClass instance method
}

This is clearly not the most elegant solution, and doesn’t scale given the fact that you have limited visual room in Firebug and Safari (you actually can’t stretch the function name column in either profiler). This also runs the risk, albeit a small one, of clashing with an existing function name. But the important thing to notice here is that it is not necessarily anonymous functions that are the source of the problem, but the fact that functions don’t actually have real names in JavaScript. It is only the variables that are bound to them that are named. So in order to solve this issue once and for all, we decided to define a way to explicitly give functions a name for debugging: the displayName attribute. In WebKit, you can now simply set this property with any arbitrary name you desire. Let’s revisit our generator example from earlier and see what we can do with this slightly modified code:

function generator(/*Number*/ iterations)
{
    var generated = function()
    {
        for (var i = 0; i < iterations; ++i)
            // do something
    }

    generated.displayName = "iterating " + iterations + " times";

    return generated;
}


If we now rerun this profile in a recent WebKit nightly, we should see something like this:

Explicitly named functions in WebKit Profiles

Explicitly named functions in WebKit Profiles

Each function is now clearly identifiable in the results, allowing us to actually make use of this data. We can extend this same approach to our prototypal classes we defined above to achieve a similar effect:

MyClass.method = function()
{
    // MyClass class method
}

MyClass.method.displayName = "MyClass.method (class)";

MyClass.prototype.method = function()
{
    // MyClass instance method
}

MyClass.prototype.method.displayName = "MyClass.method";

MyOtherClass.prototype.method = function()
{
    // MyOtherClass instance method
}

MyOtherClass.prototype.method.displayName = "MyOtherClass.method";

If we were to profile this now, the much more descriptive displayNames would show up instead of simply seeing method() used in every case. This is the basic idea behind what Objective-J does in the latest Cappuccino 0.7 betas, but it takes place completely automatically behind the scenes, so that with no code changes of your own, applications now look something like this when profiled:

Profiling Objective-J in WebKit

Profiling Objective-J in WebKit

As you can see from this profile, Objective-J now has first class profiling support in WebKit. The best part about this though is that it’s not just limited to Objective-J: any language abstraction now has the opportunity to make the same use of these tools. Objective-J happens to be a great candidate because it is such a thin wrapper around JavaScript, but a project such as processing.js could show the actual processing functions instead of their generated JavaScript analogues, or perhaps GWT could have a flag where it shows the Java methods in the profiler instead of the generated JavaScript as well. We’ve actually taken this one step further in Objective-J though, and used this feature to display information that you actually can’t presently see with normal JavaScript scripts. Currently both Safari and Firebug are incapable of profiling code that doesn’t execute explicitly in a function. This means that if a good portion of your profile is taking place at the top level of a script file, it will be completely left out in Firebug and lumped into the overly generic (program) category in Safari. But thanks to the special way we handle files in Objective-J, we are able to tell our users precisely how much time they are spending in a specific file:

Objective-J is smart about files in profiles

Objective-J profiling is smart about files in WebKit

This is actually what I found most exciting about this seemingly simple property addition. In less than a day I was able to apply it in a completely new way to supply WebKit with even more information than we had originally designed it for. I feel that there is something really interesting in the idea that the code can interact directly with the debugging tools, and its why I believe that despite the debugging situation being so poor in JavaScript today, it has the potential of being much better than that of traditional languages. Expect to see us experiment more with this new kind of debugging here at 280 North in the future, because this is clearly just the tip of the iceberg.

More Fine-Grained Profiling

The other thing we focused heavily on doing these last couple of weeks was completely rewriting the Bottom Up View of the WebKit profiler. To get a better idea of what this is, lets first take look at the other alternative WebKit currently gives you for analyzing your profiles, known as the Top Down View:

Top Down View in Safari

Top Down View in WebKit

The Top Down View shows you a graph of the actual flow of your application, a call stack with the very first functions that were executed as the root nodes and the functions they called as their children. Thus, the data in each row represents the statistics for the call stack starting with the root node, and ending in the child node. I’ve fully expanded all the nodes here to be able to see the entire call graph. If we look at the second to last line of this view, we can see that it represents a recursive call to aFunction that took place from within a call to caller3:

Call stack represented in Top Down View

Call stack represented in Top Down View

We’d read this by saying that 0.41% of the time was spent in 5 calls to aFunction with this call stack. While this representation of your profile certainly gives you a very holistic view of what happened in your program and can help you get a better idea of the general flow of functions taking place, it’s harder to draw conclusions such as which function most of the time is being spent in. To do this, we would need to add up all the individual child times and then compare them to eachother. In this simple example this doesn’t seem that daunting, but you can imagine that it can quickly become quite complex.

This is where the Bottom Up View comes in. Let’s take a look at the same profile using this view:

Bottom Up View Collapsed

Bottom Up View Collapsed

If we leave the children collapsed, this should look very familiar to Firebug users: it is a flat list of every function called in your program, and how much time was spent in each. However, where things really get interesting is when you expand the children:

Bottom Up View Expanded

Bottom Up View Expanded

Unlike in the Top Down View, the children here represent the parents, or callers, of the root function in question. For example, the second row represents the call stack starting at caller3 and ending at aFunction:

The call stack represented in the Bottom Up View

The call stack represented in the Bottom Up View

Because of this, the statistics on each row actually still refer to the original root node, and not the child as in the Top Down View. So on the second row you’d say “1000 calls to aFunction took place originating from caller3″. Essentially we are just flipping the Top Down View on its head. In order to understand why this information is so powerful, let’s take a look at a real world example I recently ran into in Cappuccino. Now, the following is an Objective-J profile, but the principles are exactly the same in normal JavaScript:

Objective-J Profile in Bottom Up View

Objective-J Profile in Bottom Up View

If we were using Firebug or any other flat listing tool, the naive interpretation of this profile would be that setFrameSize: is probably something worth tuning since it is third on our list and takes about 4.58% of our profile’s total time. This diagnosis is not wrong in the strict sense, but we may find it difficult to find out exactly why this method is so slow if we simply jump into setFrameSize:’s implementation and start hacking away. Remember that functions can be quite complex internally as well, and you may spend your time needlessly optimizing a code path in this method that was not even reached during the profile. However, we may get a better idea if we instead inspect this further and look at setFrameSize:’s callers:

Examining setFrameSize:'s callers

Examining setFrameSize:

Interestingly enough, after expanding this node we find that it is not necessarily setFrameSize: which is universally slow, but rather some special interaction between setFrameSize: and its caller sizeToFit. We know this because this method usually takes an average of 0.01% to 0.04%, but specifically when called from sizeToFit it takes a whopping 4.34%, over 200 times as long. Not only that, but all this time is concentrated in just 1 actual call, profiling gold! Perhaps there is something in sizeToFit that is purging a cache that setFrameSize: relies on, or perhaps sizeToFit causes setFrameSize: to take a completely different code path internally than normal. It could be any number of reasons, but we are now empowered with a much better understanding of what exactly is happening in the program that is causing this slowdown. In other words, this allows us to profile not only the functions themselves, but the relationship between functions as well.

What’s Next?

Debugging in JavaScript still has a long way to go. These changes are like night and day for frameworks like Cappuccino, but we have a bunch of other ideas we’d like to get implemented in WebKit’s inspector as well. We also think its important to try to take some of the work we’ve done here and get it placed into Firebug. Given that there is no one browser your code will run in, it is important to have a great set of tools on as many browsers as possible. We’ve used a hacked version of Firebug internally before, and if I recall correctly it shouldn’t be too difficult to add support for the displayName property to function objects, so hopefully we’ll get a patch out for that soon.

Addendum

As promised earlier, I have included a list of links to the WebKit, Cappuccino, and Joose commits below. The Cappuccino and Joose commits should help you integrate support for these new WebKit features in your own application or library, and hopefully the WebKit commits will inspire you to report/fix/write new features for JavaScript debugging:

WebKit

Cappuccino

Joose