Forefront TMG
ISA Server
Vyatta OFR

Fun with Forefront TMG Beta 3 and blocking download of certain file types over HTTP based on the server's response body: Part 1

 - 1. Intro
 - 2. Overview

 1. Intro
Say you have a Forefront TMG (Beta 3) and you want to block users from downloading '.exe' files, pdf files, some audio files, some video files, etc or certain archives like RAR, 7-Zip, .etc.
Please note that when one browses a web site, it may "download" html pages, image files, '.css' files, .'js' files, .etc. I've mentioned this, as the term "download" may be "misleading", like "I want users to view images but not to download them"(yeah, I know this may sound funny, but I haven't say it). Actually this may have kind of a sense(but I somehow doubt this was the sense used when I've heard it) if you want the browser to not display an image automatically, rather to prompt the user, which we can do using the HTTP header Content-Disposition with attachment.

The usual approach to this is to deal with(assuming you're not using a third-party add-on) on ISA Server 2006/Forefront TMG Beta 3:

 - block extension '.desired extension' within URL request: request file, see Figure1. Note that although we can "whitelist" the allowed extensions, in practice this may not represent a feasible solution for web browsing, as it can become difficult to manage and even so, for example with all the URL rewrite techniques employed by various web sites, we can endup blocking legitimate traffic.

 - block extension '.desired extension' within URL request: URL string, see Figure2. Note that we don't neccessary have to straight request a certain file(request URL to end with the needed file), and thus we may may more flexibility with this approach than with the blocked desired extensions within the Extensions tab of the HTTP Filter.

 - block content type: 'x/y'. We can use Forefront TMG Beta 3's built-in content types, see Figure3 and Figure4. This approach may be a more feasible way of "whitelisting" allowed content type. And we may block like so responses from web servers whose admins don't want to play by the rules, and decide to use their own content types for certain files.
Or manually block the undesired content type by signature, see Figure5, we loose the "whitelisting" approach like so.

 - block content transferred with content-disposition header: '.desired extension', see Figure6. The content-disposition header typically may be used in combination with the content-type one, to tell the browser how to handle a file, for example, if we use only the content-type header for image files, say .jpeg files(Content-Type: image/jpeg) the browser will immediatelly display the image, but if we add a Content-Disposition: attachment; filename=foo.jpeg; header, this image should be displayed to the user only if the user requests it(browser prompt, generally speaking: a way to force the manual download).

Figure1: Forefront TMG Beta 3 - HTTP Filter: Extensions tab - Block .exe

Figure2: Forefront TMG Beta 3 - HTTP Filter: Signatures tab - Block string from request URL

Figure3: Forefront TMG Beta 3: Content Types

Figure4: Forefront TMG Beta 3 Content Types - Application

Figure5: Forefront TMG Beta 3 - HTTP Filter: Signatures tab - Block Content Type

Figure6: Forefront TMG Beta 3 - HTTP Filter: Signatures tab - Block Content Disposition

You may like to read:
 - Hypertext Transfer Protocol -- HTTP/1.1, section 14.17 Content-Type
 - Communicating Presentation Information in Internet Messages: The Content-Disposition Header Field
 - Multimedia MIME Reference
 - Test Cases for HTTP Content-Disposition header and RFC 2231/2047 Encoding
 - Application of RFC 2231 Encoding to Hypertext Transfer Protocol (HTTP) Headers (draft-reschke-rfc2231-in-http-latest)

If we cover all these, then we can come up with a pretty nice content type control, indeed this is a manual approach and we have to work a little bit, but if you buy Forefrong TMG you can do all these free of charge.
We may have dealt like so with many situations we may come across.

Still, there are certain cases we cannot cover like so.
As you have noted from above, on Forefront TMG Beta 3, we do not identify the real file type, rather we make decisions based on the HTTP headers(request and response headers).
Imagine a web server admin that changes on his web server the MIME type for '.zip' archives from application/x-zip-compressed to application/octet-stream or so. Depending on how a certain file is requested, and what we've allowed and blocked, our restrictions may be bypassable.
Or the simpler case when a user changes a file's extension from '.zip' or '.exe' to '.jpg'(image file, something unlikely you will block), and uploads it to a file sharing web server or emails it using a web mail service like the one from Yahoo! to a friend of him or her, and this friend while at work behind Forefrong TMG downloads that file and then renames its extension. If you may want to allow the Yahoo! webmail or some file sharing web servers(this may vary based on your business type), even not very smart users can bypass your file (content types) restrictions like so.
Also note that for example, if you want to block users from downloading '.exe' files, such files can be archived, so if you don't block archived files, you may only partially block(the nature of blacklisting) the downloading of '.exe' files.

So can we instruct Forefront TMG Beta 3 to avoid somehow the renaming extension situation and have a way to identify certain file types, and complement somehow the usual approach described above ?
If we look at what we have by default on TMG Beta 3, the answer could be: maybe. Note that what we can do bellow, we may be able to do with ISA Server 2006, but I want to mention a new feature of TMG Beta 3, feature which if Microsoft gives us access to it, we may have a smarter way of identifying and blocking certain file types accessed by users.

Avoiding the renaming extension situation might be accomplished with some add-ons(which also incorporate other features and provide many other benefits) for ISA Server(currently, as writing, Forefront TMG is in its beta stages, so you may find few add-ons for it). However, these add-ons may not come as free, and for the moment you may not be willing to invest in such a solution.

The downside to the manual approach from bellow is that you will have to work a little, sometimes a little more, and is not very precised, flexible or a smart approach.
Also, we just blacklist certain responses from the web servers, it's not a whitelisting approach.

A file has a header(+ sub headers) incorporating a specific string(s) which helps us identifying what that file is(I know that this may sound lame in certain contexts, but let assume the users won't try so hard -:) ).
Thus we need to search and identify the specific string(s) to block a certain file type. The accuracy of the signature, at a certain extent, will depend on the string(s) we will use to block a certain type of file and the way we can write this signature.

We mainly need three things: a hex editor, Google(or your favorite search engine) and Wireshark(or your favorite protocol analyzer).
The hex editor is needed to open the file and look at it.
Google is needed(or maybe not if you are pretty sure of yourself) to search for specific files headers(sub headers), so we can get directions(if needed) or just confirm our findings.
Wireshark is needed if we want to analyze a specific server's response for the pattern we want to block.

I will use bellow Forefront TMG Beta 3 to demonstrate the process.

Now, as can be seen from Figure7, when we configure Forefront TMG Beta 3/ISA Server 2006 to search through the HTTP (response) body, we may introduce some performance issues.
So we must define carefully our signature to limit these issues. Remember that unless we look at the HTTP response body, we will not be able to indeed say the type of the file being downloaded.
Also note that this is a primitive form of search, we rather search for a "keyword", instead would have been more useful if we could have used a regex.

Figure7: Forefront TMG Beta 3 - Warning HTTP body search

 2. Overview
But before we proceed, let's have a look at some web servers reponses for a requested file, and highlight that in various cases simply analyzing the HTTP headers from the server's response may not be enough to determine the real type of a file(so far we've just discussed this in theory).

First a simple web server response for a ZIP archive download request using Wireshark, see Figure8(click on the image for the full picture):

Figure8: Wireshark - Simple web server's reponse for a ZIP archive download request

I'm calling it simple because it was a straight request for a ZIP archive, and the response is using just the HTTP content-type header, all by the rules.

Now let's take a look at a more "complicated" request and response. For example for downloading a ZIP archive(attachment download) from Yahoo! webmail, see Figure9 and Figure10(click on the images for the full pictures):

Figure9: Wireshark - Web server's reponse for a ZIP archive download request(Yahoo! webmail ZIP attachment download)

Figure10: Wireshark - Web server's reponse for a ZIP archive download request(Yahoo! webmail ZIP attachment download): Follow TCP Stream

We can say it's a more "complicated" request because it's not a straight request for specific file(rather the requested file is found somewhere in the requested URL, see Figure10). This will mean that you cannot block the request using the Extensions tab of the HTTP filter on ISA Server/Forefront TMG. But you can block it if you block with a signature the '.zip' string in the requested URL.
We can say it's a more "complicated" response because it uses both the content-type and the content-disposition HTTP headers. But they are both in the correct format, playing by the rules, so you can block it by content-type or/and content-disposition.

Now let's raise the bar a little.
Let's straight request a ZIP archive, but "mangle" a little bit the server's response.
For doing that I will add the following(simple test) on a Apache test web server, see Figure11:

Figure11: Apache Config

I could have done something on an IIS 6.0 server with MIME Types, see Figure12(click on the image for the full picture), but the Apache test web server was handy next to me and I want to "mangle" the content-disposition headers too.

Figure12: IIS 6 - Mime Types

If we now analyze with Wireshark the test web server's response for our request, we will see something like in Figure13(click on the image for the full picture):

Figure13: Wireshark - Web server's "mangled" reponse for a ZIP archive download request

As can be noted from Figure13 we cannot use anymore the content-type or/and content-disposition headers to block the download of ZIP archives.

Speaking about changing the extension, say I've emailed to a friend(who is using Yahoo! webmail) the '.zip' file as '.jpg', and he will download it like a '.jpg' and then rename it as '.zip', see Figure14(click on the image for the full picture), note that Wireshark senses this is not a valid JPEG object:

Figure14: Wireshark - Yahoo! .jpg attachment

As can be noted from Figure14 we cannot use anymore the content-type or/and content-disposition headers or URL extension or URL string block to block the download of ZIP archives if the user changes the extension.

If you've looked carefully at the HTTP reponses we've pictured so far, you may have obeserved that a certain pattern repeats in those responses, if we refer to '.zip' archives.

Before we end our fun, I will do one more thing. This is a little silly and not very practical.
Say, force GZIP HTTP compression on jpg files on my test Apache web server.
Note that a web server may force HTTP compression on some files even if the client(which could be the web proxy) does not request HTTP compression, it's not against current RFC standards.
The silly and impractical part is that I forced HTTP compression on the web server for '.jpg' files('.jpg' images are already compressed), see Figure15.
I will do so because HTTP compression will "alter" the server's response, and the needed string to block in the server's response body will be "hidden".

Figure15: Apache Config

Let's take a look, web server's response decompressed, see Figure16(click on the image for the full picture), note that Wireshark senses that the '.jpg' file is not a true JPEG file:

Figure16: Decompressed web server's answer - .zip as .jpg forced compression

Web server's response compressed, see Figure17(click on the image for the full picture)

Figure17: Compressed web server's answer - .zip as .jpg forced compression

And as can be noted from Figure17, now, if the web proxy is not able to decompress the web server's "forced crompressed response", and just let's it pass through, we may not be able to block the needed string.

TMG Beta 3 is configured by default to request compression, see Figure18(click on the image for the full picture).
On ISA Server 2006 you may need to manually configure that.

Figure18: Forefront TMG Beta 3 - Default HTTP Compression Settings

So, as we have seen, even if the web server plays by the rules, for example, when an user may change the file extension to something common like a popular image extension, unless we analyze the file itself we won't be able to tell the real type of the file by just looking at its extension and the HTTP header from the web server's response.
Also, as already said, what we will do bellow, it's just blacklisting and not whitelisting, so if a "determined" user finds a way "to pack" his files, he may bypass your restrictions.

In part 2 we will instruct TMG Beta 3's HTTP filter to block certain web servers reponses based on specific hex patterns. This may complement the usual approach of controlling content types, and help in certain situations.