IDEAS | BLOG

Sticky Cookie: Uploading Large Files

Uploading large files in an auto-scale environment.

If you work on sites with a lot of traffic, you know these sites will typically run across multiple web servers that share a single database (or perhaps several databases). Forum One has set up such auto-scaling infrastructures for a number of our clients through AWS. To share a common media repository, these multiple servers will typically use a networked file system. Such environments present a range of complex issues that you won't encounter on a site running on a single server. In this post, we will tackle one such problem.

Resumable uploading

Uploading large files, like video or audio, is subject to PHP timeouts. One way around this issue is to break up the file into pieces and use chunked, or resumable uploading. Several tools are available that provide this functionality. For Drupal, there is the file_resup module. However, in an autoscale environment, this runs into a new problem as you try to get your chunked data uploaded to the right server.

Resumable uploading and non-locking file systems

Ideally, you could just use a shared filesystem, with all web-heads uploading data to the same place. However, performing a resumable upload with services like S3 as the destination present their own challenge. There is a curious issue that causes large uploads to slow down as the upload progresses.  This thread on Hacker News describes the issue. Also, an issue has been created for the Drupal module on the problem here

An analogy can be made in this short story:

A man is given the job of painting the lines down a road. On the first day, he paints a mile. On the second day, he only paints half a mile, on the third day, he only paints a quarter of a mile, and so on. When asked why his progress was becoming so slow, he states that "The paint can kept getting further away."

The fix to this issue is to use a temporary directory as the initial upload destination. Then have the server move the complete file to the S3, once the upload is finished. However, if you mix this in an auto-scaling situation, chunks of files are uploaded to a random server's temporary directory. When you upload chunks across separate servers, you end up with files that have chunks missing. This looks very obvious when you end up with an image with the bottom half cut-off, or sections of your audio missing. 

In this particular problem, we were working with a Drupal site that used some custom fields to handle video and audio files. But this code can be adapted to handle any file field. We developed a small module to provide a cookie to users, whenever they visited a form that contained a file upload element. We then used that cookie to create a sticky session which directs all requests between that user and the site to a single auto-scale instance. 


/**
 * Implements hook_widget_form_alter().
 */
function sticky_cookie_field_widget_form_alter(&$element, &$form_state, $context) {
  $sticky_lock =  &drupal_static(__FUNCTION__);
  // Custom field types that handled audio and video files. Use the machine name of any field type here to trigger a sticky session.
  $field_types = array('soundcloud_file_field', 'youtube_file_field');
  if (in_array($context['field']['type'], $field_types) && $sticky_lock === NULL) {
    $sticky_lock = true;
    global $user;
    // user_cookie_save($cookies); // doesn't allow us to set expiration time.
    setrawcookie(STICKYUPLOAD , rawurlencode($user->uid), REQUEST_TIME + 14400, '/');
  }
}
/**
 * Implements hook_form_alter().
 */
function sticky_cookie_form_node_form_alter(&$form, &$form_state, $form_id) {
  // Adds submit handler to node forms to remove sticky cookie.
  $form['#submit'][] = 'sticky_cookie_clear_cookie';
}
/**
 * Custom submit handler for node forms to remove sticky cookie.
 */
function sticky_cookie_clear_cookie($form, &$form_state) {
  if (isset($_COOKIE[STICKYUPLOAD])) {
    setrawcookie(STICKYUPLOAD, '', REQUEST_TIME - 3600, '/');
  }
}

This does require some configuration on the AWS side:

1) Configure AWS Elastic Load Balancer to watch for Application Cookies for session stickiness. When the named cookie is sent from the backend through the load balancer, the load balancer assigns its own cookie (AWSELB=...) to maintain sticky state. It inherits the lifetime you assign to the application cookie.

Edit stickiness > Enable application generated cookie stickiness. Here we used "STICKYUPLOAD"

2) Configure the CDN (Fastly in our case, or Varnish in front of the ELB) to respect and pass the AWSELB cookie so that the ELB retains its sticky state. Normally, we strip all except Drupal's session-related cookies at the CDN to improve cache hit rates.


if (req.http.Cookie) {
  set req.http.Cookie = ";" req.http.Cookie;
  set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
  set req.http.Cookie = regsuball(req.http.Cookie, ";(SESS[a-z0-9]+|SESS[a-z0-9]+|NO_CACHE|STICKYUPLOAD|AWSELB)=", "; \1=");
  set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
  set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
  if (req.http.Cookie =="") {
    unset req.http.Cookie;
  }
  else {
    return (pass);
  }
}

And there you have it. An application plugin that sets your sticky cookie, in our case, a Drupal module that reacts to rendering the field widget for a given field type, and a small configuration in AWS and in your varnish .vcl file. Enjoy!

More good ideas we think you'll enjoy