Log in

entries friends calendar profile Scribbles in the Dark Previous Previous Next Next
Why Axiom Doesn't Expose SQL - Please Visit http://glyph.twistedmatrix.com/ - This Blog Is Closed.
Sorry about the ads, I can't turn them off.
Why Axiom Doesn't Expose SQL
I didn't put the full title of this article into the "subject" because it was super long.

It Is Pretty Much a Bad Idea to Expose Raw SQL Through Your Database Access Layer


Fun Things I Found Out About Your Company With Administrator Access to your Database

I was originally inspired to write something about this when I read Jonathan Ellis remarking that ORMs should include direct SQL access because Django recently added a different, but still dodgy, 'OR'-operator support to a syntactically disappointing ORM query syntax. However, it really came to a boil for me when I found that Divmod had been running some third-party software with an SQL injection vulnerability. (Yes, we have since patched it, no harm done.)

Security experts have long known that code-injection attacks are pretty easy on many popular programming platforms, and you should take steps to prevent them. It's easy to find commentary on this. Stephen Thorne has had many amusing and insightful things to say about PHP's vulnerability to SQL injection attacks, as well as the occasional dig about just injecting PHP code itself. If you're looking for something more serious, Steve Friedl has written a fairly comprehensive guide to understanding, executing, and preventing against SQL injection.

An ORM's job is to provide an alternative interface to a database. Interfaces should be complete things, not broken fragments of utility which require manual crank-turning to function. If you have to use a different mechanism to access the database within your application, the ORM is incomplete and should be fixed. Sure, many programmers who use ORMs also know SQL, and that is a useful skill, because today these are in closely related problem domains, but they should not have to use SQL within the same context that is using the ORM.

The python "os" module provides (among other things) an alternative interface to a large portion of the POSIX C API. As I said, interfaces should be complete things. If you use a different mechanism to access the POSIX C API, the "os" module is incomplete and should be fixed. Again, sure: many programmers who use Python's "os" module also know the POSIX C API, and that is a useful skill, because these are related problem domains, but they should not have to use the POSIX C API within the same context that is using the "os" module.

In both cases, you can generate the underlying code yourself, and in both cases, people sometimes really need to, so the fact that you can is important. However, Few C programmers ever want to drop back down to C when they're using Python, and will rightly avoid it (as a complexity cost) when they can; yet many Python programmers who use ORMs frequently and loudly declare that they want to use SQL all the time.

Not to pick on Mr. Ellis. The syntax he's reacting to really is abhorrent (although that's no comment on Django as a whole), and the tremendous Ruby on Rails movement seems to largely agree with his point. There are good reasons that people want SQL access from within ORMs. It's simple: most ORMs are really, really awful. They are heavy on the "object" and not so much on the "database". There are a lot of features that SQL provides which they don't expose.

If you use an open-source ORM and find yourself wanting to use SQL to get at one of those features, consider not clamoring for (or not using, if one exists) an SQL execution back-door included in the library. Instead, consider ways to integrate that SQL feature with the existing structure of the ORM, or an extension which wraps that SQL feature on top of the ORM and only generates the SQL you need in one small place. Obviously this goes double if you are a user of Axiom - I do my best to accept any patches that expose new database features that were previously obscured.

Don't just accept the status quo and generate SQL strings from within your application. Originally this post was going to be longer and talk about API structure and communication between programmers and preconditions and postconditions and all kinds of fancy computer-science garbage, but I think it would be better to leave you with just this one thought - the security implications alone are more than enough reason to be extremely sparing, and careful, with the places that your code generates SQL. Isolate it, test it, audit it, and don't make it a habit.

Update: This article is confusingly titled. In fact, Axiom does provide an API for getting at SQL. Store.executeSQL. The point I am stressing here is that axiom does not "expose" it in that it is not a supported, public API, and if you have to call it, Axiom is broken and you should let Divmod know what you needed it for. To attempt to totally deny access to that layer would be unwise; as I said earlier in the article, "you can generate the underlying code yourself, and in both cases, people sometimes really need to, so the fact that you can is important".

Current Music: Every Planet We Reach Is Dead (by Gorillaz on "Demon Days")

5 comments or Leave a comment
fuz22 From: fuz22 Date: January 3rd, 2006 10:50 pm (UTC) (Link)
The major lesson I took from Friedl's piece is that it is wise to use prepared statements.

I have issue with this.

Every API implementation of prepared statements I've seen has relied on integer-indexed variable setting. This is the stupidest thing ever. If I wanted to be doing a lot of hard-coded indexing, I'd be coding in C.

So: Is there an implementation of prepared statements in a high-level language that uses name-based set syntax? If not, am I simply doomed either to be vulnerable to code injection or to use stupid code practices?

The solution I've been using is to replace all quotes with their html equivalents:
s = s.replaceAll("\"", """);
s = s.replaceAll("'", "'");
which has the advantage that the replaced text has no quote characters in it. Friedl implies that this is easily broken if you just know how. Understanding that method would sure light a fire under my willingness to change over.

glyf From: glyf Date: January 4th, 2006 01:34 am (UTC) (Link)
  1. Your complaint about named bind parameters is specious. In all the database bindings I know of, there is a named bind parameters syntax; in "good" bindings (which use database libraries to do it) the syntax is 'select foo from bar where colname = :varname'. In 'bad' ones in Python, where they elect to do it in the driver for a more 'pythonic' syntax (IMHO a total waste of time, which has occasionally been exploited for doing it incorrectly) the syntax is 'select foo from bar where colname = %(varname)s'. I know that C# supports some of these too, because I recently perused some oracle articles on the topic. My Java is super-rusty but I believe it does as well.
  2. Why do you mind binding ordered parameters? When you call functions in most languages, parameters are positional. If you prefer all-keyword languages such as Smalltalk, I guess I can't comment effectively; I don't know what database bindings look like in Smalltalk. Still, I can only assume they support named parameters as well.
  3. It would take 5 minutes, if that long, to write a trivial wrapper function which bound named parameters via a hashtable or somesuch; your application should never concern itself with actually calling 'replaceAll' (what language is that from, anyway...?)
  4. The quoting scheme you have selected is broken; if the input contains the string \'hello\', you destroy the backslashes in the round trip to the database. A pain in the ass if your site ever has to store code snippets as strings in the database - and totally mystifying to your users.
  5. Maybe the strings you're storing aren't coming from or going to HTML - your printed reports will look like garbage, even if you fix the previous deficiency
  6. I don't know of a specific vulnerability, and I don't have much incentive to look very hard, but I can think of avenues of investigation - how does your program deal with Unicode? Is there a way I can convince it to encode a ' on the way to or from the database? Is there anywhere that you forgot to call one of those replaceAll calls? Are there any special control characters that your particular database vendor uses to manipulate the database connection? By using bind parameters, you are using the API that says, "this data came from a user. I do not know what is in it. Please just put it in the database.". Once you've started generating SQL strings from user input, you're taking on all the responsibility for knowing every obscure quirk of every database SQL syntax, quoting rule, ODBC/JDBC/DBI driver feature, unicode database/web input encoding combination, and special crazy extra escape sequences that you are likely to encounter.
Sorry I couldn't give you anything more specific, but I think those reasons are enough to put up with the mild "inconvenience" of positional bind parameters, if that were even something that you had to put up with, which you don't.
fuz22 From: fuz22 Date: January 4th, 2006 03:09 pm (UTC) (Link)
1) I am using Java. The Javadoc is here:

There do not appear to be any named binding mechanisms. This is a problem for me because I frequently use SQL statements of up to 20 parameters.

2) I think your analogy to functions is specious. It's not that they are merely ordered, it is that they are bound to integer indexes. This seems more like using a huge of typed variables for your program named var1, var2, var3, etc..

3) I fail to understand how this would work, and I would really like to. Explain further?

4) heh. I.. uh... totally fucked up the code cut-n-paste. My code replaces annoying characters with their HTML entity replacements, not what appears above. So backslashes don't get replaced at all - the \" there is escaping the double quote in the java string.

5) I make no claims that this is a good general solution, but all of my strings are going to or from HTML.

6) I totally agree that PreparedStatements are the most correct way of dealing with this issue for security. I'm just not sure that for all cases, the most secure way is needed, particularly if I have to use unnamed bindings.
glyf From: glyf Date: January 4th, 2006 03:53 pm (UTC) (Link)
1 - Database support is apparently spotty. Here's the Oracle-specific way of doing it; I don't know if the database you're using works. Using Oracle Named Bind Parameters

2 - Use object arrays or something, loop over them.

3 - Implementation is left as an exercise for the reader (as I said, my java is absurdly rusty), but here's the idea:
Object[] paramDesc = {"select * from foo where name = ", 
                      new MyNamedBindParameter("nameIs"),
                      " and y > ",
                      new MyNamedBindParameter("yGreaterThan")};
// save this somewhere - it constructs the literal query string with ?s, calls 
// prepareStatement, wraps PreparedStatement
MyProtoQuery mpq = new MyProtoQuery(conn, paramDesc); 
MyQuery mq = mpq.create();
mq.bind("nameIs", "bob");
mq.bindInt("yGreaterThan", 100);
ResultSet rs = mq.execute(); // voila

Alternatively, use an ORM which does something even more advanced for you :-)

4 - As it happens I was looking at the HTML source of your entry as I was replying to it! My criticism remains valid. As far as many databases are concerned, '\&1234;' === "&1234;". The backslash is implicitly deleted, since it is an escape. The escape character is not always backslash.

Also, hah, that just reminded me how you break it this style of quoting. Assuming you are binding *two* parameters in one query in this fashion, like so:

"select * from foo where bar = '" + crappyEscape(s1) + "' and baz = '" + crappyEscape(s2) + "'"

make s1 be the string '\' (or your database-specific quoting character of choice) and s2 be the injected SQL of your choice -- as long as it doesn't contain quotes, of course. It would be a real challenge to build a fully syntactically correct query, but in some databases subqueries are executed as they are parsed.

5 - OK, fair enough, but this is a flexibility constraint which you may have to deal with as a maintenance issue if the current fad of HTML-based applications ever fades. (Does the web seem eternal to you? "The demand for punch card operators will exceed 100,000,000 by 2010!")

6 - It may be worthwhile to consider more convenient wrappers for correct ways to do things, but that hardly seems like an excuse for doing things wrong.
glyf From: glyf Date: January 8th, 2006 10:41 pm (UTC) (Link)
For those of you still thinking that maybe manual SQL generation is a good idea: it isn't, I won the argument :-).
5 comments or Leave a comment