On HTML 5 Drag and Drop

HTML 5 is shaping up to be quite an impressive step up from the capabilities web developers are currently constrained to. One of my favorite new features provided by the spec is support for native drag and drop. Cappuccino and many other JavaScript libraries have had drag and drop support for quite a while now, but with one important caveat: the drag operations were limited to within the browser window. This was not only visually displeasing, but prevented you from being able to share data in a user friendly way from one web app to another, or even to other desktop apps. HTML 5 aims to change all this by giving us access to the computer’s native drag system and clipboard. I took the last week to really familiarize myself with this API and its various implementations on current browsers so I could start adding support for it in Cappuccino. I feel that this gave me a pretty unique perspective on the current state of this feature which I’d like to share, mainly because I’ve had to make it work in a number of real (sometimes shipping) applications, as opposed to simplying creating small demos. The good news is that last night I was able to land my first commit which adds full HTML 5 drag and drop support for Safari and other WebKit-based browsers. Here is a short movie that shows this feature in action in our internal 280 Slides builds:

As you can see, this feature enables you to easily share data, whether it be images and shapes or full slides, from one presentation to another. What’s particularly cool about this is that you won’t have to change your existing code at all since Cappuccino simply detects when you are on a compliant browser and magically “upgrades” to native drag and drop. On older browsers, you will still get the old in-browser implementation. Ah, the beauty of abstraction.

This isn’t to say that working with this feature was all peaches and cream though. For starters, this feature is far from complete in any browser. I experienced a tremendous amount of bugs, crashes, and inconsistencies in all the browsers I tried. On the one hand, I got to play with a very exciting new toy, and on the other I was given a glimpse into the future of the bugs I would be dealing with for years to come (just when we thought the whole cross-browser thing was starting to become managable). This isn’t surprising of course, it is a very new addition and the spec isn’t even 100% complete yet. For this reason, I’ve decided to split this post up into two pieces. In the following I will be discussing what I believe to be actual and serious design flaws in the current API, as well as a few suggestions I have for how they might be remedied. I will also separately link to a page that has all the bugs and inconsistencies I discovered (as well as the associated tickets I filed on them), and workarounds when I could find them.

I believe the main “theme” of the problems I encountered was due to the fact that I am trying to build full-blown applications as opposed to dynamic web pages. This however is no excuse, as one of HTML 5’s supposed goals is to usher in an era of more web apps that are more competitive with desktop apps. This is precisely why Google is supporting it so heavily.

Lazy Data Loading

One of the key facilities of drag and drop is the ability to provide, and get, multiple representations of the same data. Different web pages, web apps, and desktop apps support different kinds of data, so it is up to your application to give them something they can work with. Take 280 Slides for example: When a user drags the slides out of slides navigator, he may be planning to drop it to any number of locations. If he is dragging it from one instance of 280 Slides to another, then we want to provide a serialized version of these slides so that they can be added to the other presentation. If however, he drags these slides into a program like Photoshop, then we would want to provide image data. If he were to drag them to his desktop, then perhaps we could provide a PDF version. He could even drag them to his text editor and expect the text contents of his slides to be pasted.

Multiple Data Types

Multiple Data Types

The way you do this currently is with the setData function, which allows you to specify different types of data:

document.addEventListener("dragstart", function(event)
{
   event.dataTransfer.setData("image/png", slides.imageRep());
   event.dataTransfer.setData("slides", slides.serializedRep());
   // etc.
}, false)

This is incredibly common on the desktop, and you’ve probably never noticed it precisely because it works so well: things seem to just do the right thing when you drag and drop them. However, an unfortunate side effect of this feature is that you end up doing a lot of extra unecessary work. The user only ever drops the item to one location, and so all the other formats you’ve created were wasted processing time. This is not a big deal for simple cases of drag and drop, but it becomes quite noticable in large applications like 280 Slides. In the example above, creating serialized and image representations of these slides can become quite slow depending on how many elements are in the individual slides and how many slides you are moving. Because of this you may experience a lag when you first drag the slides out. The worst part is, if all you intended to do was reposition the slides in the same presentation, then you didn’t need any of these formats!

This problem was solved in a very simple and intelligent way on the desktop a long time ago: simply delay supplying the actual data until the drop occurs. At the point of the drop, you actually know which of the 5 supplied types the user is interested in, so create it then. Not only does this save you from doing uncessary work, but generally users notice time spent processing after a drop a lot less (because there is no expected user feedback to stutter). I’ve thought a lot about a good way to allow the user to do this with the existing setData method , and I think it could be done by simply allowing developers to provide functions that could be called when the data is needed:

event.dataTransfer.setData("slides", function()
{
   return costlySerialization();
} );

Perhaps a more backwards compatible alternative would be:

event.dataTransfer.setData("slides", { toString: function()
{
   return costlySerialization();
} } );

Although I don’t really think this is necessary since this API is so new. Either way, this allows us to use the existing setData method, while not actually needing to calculate the string value until getData is actually called by the drop target.

Initiating Drags

Another major hurdle I encountered was in controling the way drags are actually started. Currently this is a delicate dance of preventDefaults and interactions between mousedown, mousemove, and dragstart, in combination with the draggable HTML attribute. The basic problem with this is that it leaves the decision to create a drag entirely to the the browser. Again, this is just fine for simple cases, but it really starts to break down when you are building full on applications in the browser. On the other hand, frameworks like Cocoa allow the developer to initiate the actual drag sequence. Lets look at why this is important with a simple example. It is quite common to want to start a drag event on the initial mouse down, instead of waiting for additional mouse move events. In these cases, it would be more confusing if the initial mouse down did nothing. This is currently impossible to achieve with the HTML 5 drag and drop APIs. In Cocoa, this would be quite simple, requiring the developer simply start the process in mouseDown: instead of mouseDragged:

- (void)mouseDown:(NSEvent *)anEvent
{
   [NSView dragImage:myImage /*...*/];
}

This is just a simple example of course. More complex widgets provide even more cases where drag and drop in the browser really works against you. Take tables in Mac OS X, which provide different behaviors depending on what direction the users drags in:

As you can see, when a user drags upwards in a table on Mac OS X, the selection of the table changes (in other words, no drag takes place). On the other hand, if the user drags left, right, or diagonally in any way, then he is allowed to move these files. This is very intuitive experience when you use it, and is absolutely trivial to implement in Cocoa:

- (void)mouseDragged:(NSEvent *)anEvent
{
   if (deltaX > 10)
       [NSView dragIamge:myImage /*...*/];
   else
       [self modifySelection];
}

However, this is again basically impossible with the current HTML 5 API, as you can never be a part of the decision as to whether an object is dragged or not. Once you get the drag event, it’s too late. You can imagine that this would become even more cumbersome in applications like Bespin that revolve less around specific tags and more around content that is drawn to a canvas elements. When a user drag in Bespin, they have to decide between any number of actions. I think a good solution to this would be to simply allow the developer to manually kick off a dragging event loop from either a mousedown or mousemove callback. Something like this:

document.addEventListener("mousedown", function(event)
{
   event.startDrag();
}, false);

document.addEventListener("mousemove", function(event)
{
   if (someCondition)
       event.startDrag();
}, false);

In both these cases, calling startDrag would result in no further mousemoves/mouseups being fired in this event loop, and instead would kick off the drag event loop with a “dragstart” event. A matching cancelDrag() could be provided as well. This would allow you to cancel a drag, but not any other specific behavior such as selection. Currently calling preventDefault cancels both drags and selection. This actually leads to a number of other confusing results. For example, if you place a textfield within a draggable element, it is essentially impossible for text selection to happen in that textfield, even if you set the textfield itself to not be draggable.

Drag Images

One of the nice parts about drag and drop is that you are allowed to set any arbitrary image or element as what is actually rendered during the drag process with the setDragImage method:

dragEvent.dataTransfer.setDragImage(aDOMElement, offsetX, offsetY);

However, on Firefox it is required that this element already be visible. Now, I wasn’t sure whether to list this as simply a bug in Firefox or an actual design flaw, but I chose to list it as a flaw because the documentation at mozilla.org would seem to suggest that they may consider this to be “correct behavior”. Safari does not have this restriction, and in fact Firefox even seems to make an exception for canvas elements. Firefox seems particularly strict about this requirement too, as I tried positioning an element offscreen in a negative position, setting its visibility to hidden, setting the display to none, and even placing the element in an offscreen iframe, anything to prevent having to actually flash the element in some random portion of the screen before dragging it. It seems to me that this method exists for the purpose of showing something different, and thus it’s a bit unreasonable to expect it to already be not only in the document, but visible as well. My request here is simple: that it should simply work the way it does in Safari.

Conclusion

Drag and drop is an incredibly important part of the way we interact with computers, which is why it is so crucial that we get it right from the beginning. I really hope my concerns are heard and that we can come up with some good solutions to the initial problems I faced with this young API, so that we can avoid the windows of incompatibility that plagued the last updates to HTML. In the meanwhile, I’ve filed a bunch of bugs and documented my current experiences here.

  • HTML 5 is like wrapping aluminum foil around a pig. It's shiny but in the end it's still wrinkled up aluminum foil wrapped around a fat pig. In the end developers will still rely on this pig to feed themselves.

    The DOM is a mess, HTML has become a mess, CSS is a mess, ECMAScript is a mess everything about developing applications in the browser is a complete mess. But in the end this is what we have as a foundation to build our applications off of. However bad the foundation might be the end user doesn't care; they just want desktop quality applications that run in the browser.

    Application frameworks like Cappuccino and SproutCore are being set in place now to make ridiculously simple task which have been made ridiculously complicated by web technologies simple again.

    The idea behind HTML5 is to make the browser more capable of actually running applications. We keep building off unstable foundations and now we're running into the problems mentioned here.

    The HTML5 standard is bringing a great deal of new technologies to the browser, but eventually these problems are going to have to be addressed. Eventually we're going to have to cook this pig wrapped in foil so that the only thing left are the good parts... so that it comes out as barbecue. If not we're always going to be building more frameworks on top of awful foundations to abstract ourselves.
  • In case you haven't noticed it: The whole internet, from top to bottom, is a series of nested foil-wrapped pigs. Many people have expressed a desire to cook the thing, in whole or in part, but no one's managed to lever the thing into a big enough oven yet.

    Grumble if you like, but what you're observing is the best process possible for the web—because it's the only one that's managed to cause progress of any kind. If you think you can do a better job of herding cats and applying dynamite to foundations, more power to you.
  • Except our progress is uselessly delayed. While we're all slaving over supporting IE we slowly build technologies that already exist again, so that they will be abstracted from browser bugs. Drag and drop for example will continue to be use javascript in jQuery, Cappuccino, SproutCore, etc probably until the end of time for abstraction sake while native html apis go unused. Simply because we have to engineer things that are already there we are slowing progress. There has never been another process for progress on the web.

    Now, whether I like it or not the browser is the future of applications I just wish the foundation was a bit more stable.
  • That evolution is slow is conventional wisdom. I think Steven Den Beste said it best:

    In the long run, Darwinian evolution sounds like a good idea. But like adventure, it’s better read about than experienced, because the only way it can really happen rapidly is with a high death rate.
  • This seems like useful experience; perhaps you could post it to the whatwg mailing list or the W3C public-html mailing list.
  • It's quite useless on Webkit if you're implementing a generic dragstart handler and need to use dataTransfer.getData to find out what data is being dragged to begin with. The bug is is explained in WebKit Bug 23695 (https://bugs.webkit.org/show_bug.cgi?id=23695).
  • alos
    Interesting post! The D&D between browser windows is awesome!
  • Laurent
    No, the D&D between browser windows is useless
  • alos
    That is very shortsighted of you. Can you imagine dragging code from your Bespine to an article being written on another window in Google Docs? Moving your slides from 280Sildes to a presentation being done in another webapp? Moving calendar or contact info between apps? D&D Is a very interesting, I'm sure dragging between browser windows is just the start.
  • jax
    For me the exiting thing about native drag and drop, unlike the JS library emulation, is that you get drag and drop that really works with your environment. You can use your keyboard for drag and drop, or have drag and drop on your phone (most phones don't come with an attached mouse), you can have voice-controlled drag and drop, or in systems like Nintendo Wii you could have a quite literal drag and drop support.
  • gazhay
    HTML5 is never going to be the answer for a couple of good reasons.

    The browser manufacturers only implement it while it is good for them - as soon as HTML5 (WHATWG or WWW) mandate something they don't want to do - it breaks.

    HTML is a mark-up language that has been bullied and abused into a "solution" to deliver web applications to users. That shouldn't need explaining, it's like using a bar of chocolate as a screwdriver.

    HTML5 doesn't attempt to clean up "tag-soup" or completely badly written HTML, we still have engines 'guessing' at what authors meant.

    And finally, a number of people "behind" the standards are academic snobs, who look down on anyone else. (Hixie is *not* one of them)
  • Name
    At least some people try. Complaining that it will never work is exactly what I would expect from a "academic snob". At least these guys dare to innovate!
  • J. King
    Gee, why don't you trot out more baseless, unsubstantiated accusations while you're at it?

    That reality (browser vendors are the ones who make browsers---amazing!) complicates getting the perfect outcome shouldn't surprise you, and it shouldn't discourage you, either. The endless march forward of technology is littered with imperfect solutions to complicated problems, with hacks layered upon each other neck-deep which enough effort and code have actually turned into something useful. It's nothing new, and it's not bad. If you think you can come up with another solution which actually has a chance in hell of getting adopted by thousands of corporations and millions of individuals in hundreds of countries on a sane timeframe, you're probably wrong, anyway.

    For the record HTML5 -does- attempt to 'clean up "tag-soup"': read Section 9. A staggering amount of time has gone into making sure that future agents get a consistent, interoperable parse tree in standard mode while also mirroring existing behaviour as much as possible; ignoring all this hard work is folly.
  • gazhay
    You both think I haven't been involved - I have.
    I was on the WHATWG and W3C mailing lists, and insulted off list by numerous of the "academic" snobs. The procedure is flawed, as is the use of HTML5 for web applications. Neither of you argued away that point, instead you said - it's the best we can do. Settling for second best is bull.

    Mr. King, you also come from a silly point of view, Adobe Air, Silverlight and Flash all attempt to do what you suggest, so saying "getting adopted by thousands of corporations and millions of individuals in hundreds of countries on a sane timeframe, you're probably wrong, anyway." is just plain pathetic. Take your delusions elsewhere.
  • Part of the problem, apparently, is that HTML drag-n-drop is *not* a new API. It's been in MSIE for about ten years, and is being standardized after the fact. What's in the spec can't really be changed in incompatible ways because there are a lot of websites that support the MSIE API.

    (Note: I have no direct knowledge of DnD in MSIE, but this was the answer given by Hixie to some recent comments on the API's deficiencies on the WHATWG list.)

    I'm actually doing a bit of work on DnD support in Chrome and WebKit. I have a WebKit patch out that fixes a bunch of nits with the values of dropEffect and effectAllowed, which will hopefully get reviewed and checked in soon.
  • Great post! One comment on the delayed drag change idea -- I love it, but one thing it lacks is the ability for the drag target to filter the results. You would need a way to tell the drag event what types are being dragged. Cocoa does this with -[NSPasteboard addTypes:owner:] paired with [NSView registerForDraggedTypes:]. Just something to remember if you are begging people for API changes ;)
  • Great post, good insightful look at the DnD API.

    Like Jens Alfke said this API has been reversed engineered from Microsofts own proprientry API and is not new. There are however further proposed extensions to the current API such as the files attribute on the dataTransfer method that I wrote about http://www.thecssninja.com/javascript/drag-and-... this allows us to go in the other direction by dragging from our desktop into the browser.
blog comments powered by Disqus