Google’s App Engine 1.3.0 was released yesterday along with a brand new Blobstore API allowing the storage and serving of files up to 50MB.
Store and Serve – Files can be uploaded and stored as blobs, to be served later in response to user requests. Developers can build their own organizational structures and access controls on top of blobs.
The way this API works is pretty simple. To upload files you can an API that manufactures a POST URL that web forms requests containing files data are submitted to. App Engine processes the POST request and created the blobs in its storage (and BlobInfo objects – readonly datastore entities containing the metadata on each blob). It then rewrites the request, removing the uploaded files data and replacing them a Blobstore key pointing to the stored blob in the App Engine Blobstore, and calls your handler with this data.
To serve an existing blob in your app, you put a special header in the response containing the blob key. App Engine replaces the body of the response with the content of the blob.
Now this is pretty straightforward but there are few concerns with this approach:
1. What about request validation (authentication\authorization etc.)?
When uploading files, the request reaches your code only after blobs have already been processed and stored. This means that you can only handle authentication\authorization or even form validation after data has been stored.
This means you’ll have to write code to clean the relevant blob entries in case of failed authentication\authorization\validation – more datastore API calls, more CPU…
It also means that without taking care of these special cases any newbie hacker with a simple snifter (or FireBug) can start uploading (and potentially) serving files off your service (see update).
2. No way to preprocess data
As the files data is already stored prior to the program’s handler being called, there’s no way to preprocess submitted data other than reading it from the store, processing it and storing it again.
There’s also no straightforward API to access or store blob data in code, so the above process has to be implementing using URL fetching (fetch the image via http call, process it, store it again using http POST call)
There must be a way for the Google App Engine team to wrap this app nicely and provide a clean API for this to be done efficiently (along with solving the validation problem described before)
As the Blogstore API is still in experimental phase I guess we’ll see some quick progress made on its development and hopefully the Google team will solve the issues above.
Atleast now there’s a beginning of an alternative to Amazon S3 for AppEngine applications.
Update:
Bret Slatkin notes that when the API manufactures the POST URL to be used for uploading the files, it creates a unique one-time URL which which mitigates any potential sniffing.
This fits perfectly for the scenario when you’re rendering a web form to be submitted by the user. But, it makes things harder if you’re trying to provide a REST API that allows uploading files (think of something like TwitPic for example). In this case you’ll have to write your own render that simulates what a web form would do (get the files, create random POST URL, call it, …)
Related articles by
Zemanta
- The
Unofficial Google Text-To-Speech API (techcrunch.com) - Google
Releases API for Cool Visualization of Data Mashups from Many Sources
(readwriteweb.com) - Google
Fusion Tables API (googlecode.blogspot.com) - Zoho Reports Moves Out of Beta, Pricing Plans Announced (blogs.zoho.com)
(Cross-posted @ Developer Zen)
Thanks for the write-up!
The upload URLs are one-time URLs, which mitigates any potential sniffing. An application can do authorization checks before an upload form is displayed.
Hi Brett,
Thanks for the response. I didn’t notice the fact that the URLs gnerated are one time. But this make the solution only viable for web forms.
What should API developers do? Lets say that I want to provide an API for iPhone\Android clients to call (lets think of TwitPic’s API as an example).
What I need to do is:
* Have the call made send the picture data to my own API handler
* My API handler code handles authentication etc. and then generates a one time upload URL and then makes a POST call to that URL with the data
* Another handler is in charge of returning the success of the upload
This should work (I’m writing the code now) but its not as easy\straightforward as it could be.
It would have been a lot easier if this process was wrapped in a simple API call that lets me write data to the blobstore (just like S3 provides)
Btw, can you share some details on how the blobstore is implemented internally? what makes these different than datastore entities?
Regards,
Eran
Hi Eran,
What you propose (API generates the URL, then posts to it) is the right approach for now. Agreed that something more REST-like would be easier for application developers. This release focuses on web-based forms.
Nothing to share yet about how Blobstore works, but we do have a history of being public about how things work under the covers.
-Brett
Hi,
Can’t an API use a two-step process for uploading as follows?
1. client make API call to request an upload URL. Handler checks credentials and returns unique URL generated by Blobstore API.
2. client uses unique upload URL to upload file.
Ben
Hi Ben,
Thats exactly what I said. But thats a lot of workaround code to write…
There should be a clean easy way for developers to do this…
Regards,
Eran
I have translated this article to Chinese on my blog. I don’t know whether there is some problems about the copyright. If you don’t like it, tell me to delete it.
I love the blogstore and mapreduce. I made an example/demo after testing it out.
http://demofileuploadgae.appspot.com/ – my demo
Did anyone figure out how to do a blobstore upload from a running GAE code?
I have a cronjob that fetches some HTML content, processes it and I would like to store it in the blobstore. What I can’t figure out is how to POST the “file” to the UploadHandler I have.
I’d love to see Google provide an API for direct access to the Blobstore, or at least a tempfile under Java. I am developing a service that assembles a zip file from a bunch of assets that are fetched via HTTP – I just need to be able to spool the data temporarily. I noticed that python seems to provide a tempfile API – does Java?
Here’s a pretty good tutorial explaining how to use the Blobstore to store and serve images in a GWT/GAE app. The overview drawing is pretty helpful in understanding all the components.
http://www.fishbonecloud.com/2010/12/tutorial-gwt-application-for-storing.html