danger to HTTPS, doom to SPDY

Since the BREACH attack, it seems that there is no way to transport content securely in the HTTP world.

The BREACH arrack is a HTTP version of CRIME, which recovers encrypted message by analyzing compress ratio of different media. It is well-know that people can distinct picture from text by the compress ratio, however, before CRIME, there is no easy way to detect what exactly the information is by the ratio only. But the breach always exists. The word “faster” and “sunoru” have the same length, however, the entropy(binary) of “faster” is 2.58496, and the entropy of “sunoru” is 2.25163. So, if you know the origin length(6) of the words, and also get access of the entropy of the words, you can easily obtain rich information from the results. For a “prefect” compress algorithm with a observe-only way to get information, you can get how much time different alpha is included in each word, which, generally, is not so useful(But shouldn’t be public even so). But real world compress algorithm is NOT prefect, and real world environment is NOT observe only. You can send a message to the server to determine which real world compress method the server is using, and you can obtain much more information form the simple ratio if multiply requests are made by CRIME attack.

For HTTPS, it represents a danger for web pages with simple information. For example, some banks in China using number in a picture to show how much money you have, when the picture is compressed, it is pretty easy to obtain the real number the picture shows by compress ratio. By using a precomputed table, you can decrypt millions of those “money pictures” per second with a Macbook Air. So if you find your bank is transport money number in picture, you should be aware it may be a deliberate way to publish those information to the whole net.

However, for SPDY, your app may be cracked even without deliberate setups. SPDY’s speed is based on compressed headers, which include URL, cookie, and authority token. As client will send the header wherever people visit the same site, you just need to XSS the client to a static page(eg. a 404 page~), then you can obtain all the information in the header without any painful struggle. And when you get the header, you get the URL(so the complete browsing history is public), the cookie and authority token(so the log-in status of the personal), and all the content of page. So, it’s just like that you are visiting the page using HTTP without S.

Not only HTTPS and SPDY is effected, Tor, which uses gzip as it’s compression algorithm, is also affected. But it may be not so easy to crack Tor as it reuses TCP tunnel… SSH with compress can also be decrypted this way, however, it need some small skill and lucky to do the gzip guess as you cannot easily make the user resend things.

In conclusion, SPDY is just like clear text for a careful attacker, and HTTPS is not so secure anymore…

Good news is that Network working group finally find danger in compression, and decide not to support compression any more in TLS 1.3 draft-02. Have I said that is a good news? It seems not a pleasant one for those with limited network resources…

HTTPS SNI

SNI means Server Name Indication, which is a technology to let server know which domain the client is linking to and return the certification correspondingly, which make a single IP possible to server multiply HTTPS sites. It is defined in RFC 6066 section 3.

The protocol extension change the handshake process in the TLS. The client should include a struct array which defined in each way (currently only DNS hostname are widely supported) of defining name, which name of server the client want to link to. And if the server have the certification, the handshake goes on normally. If not, the server should send a fatal level error and drop the connection, or just go on as if nothing happened(and give out the default certification).

The protocol also influenced the session cache of TLS server. The TLS server which support the extension will never give out any session to client if the server_name mismatches. Even if the client have all the outer things qualified.

Some people think that SNI will add security risks as the client will transport the server name in clear text. However, if a site is a TLS site(wo SNI), anyone can know whom the client is talk to by linking to the server. So that the IP in traditional TLS servers give out the information of the domain. Say the domain will not add security risk to the protocol.

In fact, as the protocol provide another way to check session cache, it actually reduce the risk(though seems impossible&useless already in traditional TLS server) the server use the wrong TLS session which is opened by an attacker to send message to the user.

SQL’s not end

Today, in a distributed cloud environment, there is no good DB that can have both ACID and SQL support, at the same time keep the performance up. Though in some case it is true, it can’t be applied to all DBMSes. Berkeley DB, a famous Key/value based nosql DB, support most SQL statement as well as XML/XQuery, it also support Java object API.

noSQL means not only SQL, not means NO SQL. Mongo and its followers have set up a bad example for followers. a noSQL DBMS should not use SQL as it’s base query language, but it should can support SQL as a higher layer query language, just like what foundationDB tried to do. Of course, the problem is ACID, which is a rather difficult problem for a sharding DB,who becomes more and more common as the cloud computing is conquering the server world. SQL language itself is not difficult to carry out, ACID is.

But ACID is a must-have feature in many apps, not only bankers needs ACID, all app developer who want to make a simple but strong app must have transitions as one of their tools. No one can stand an app that only act normally when user got the luck. Those who want to throw away ACID can’t walk long as this is not possible for many things.

There are many ways to overcome ACID implement problem. In cloud, locking is an unacceptable way, unless there is some way to lock at a very small, accurate scale, which is not a easy job for 100+ sharding servers. Log and check is another way, which is easier than accurate locks, though it’s not an as easy and comfortable way as a single DBMS in an old good big lock, and will cause exe time limit in some implement. But it’s possible, and that’s enough.

If ACID is possible in a cloud environment, SQL will be, too. But it may exist as a layer top on ACID system based on some simpler API. Whatever, SQL will not be ended by noSQL and cloud, it will still be used in many places for whoever wants to keep data update easy(or even possible). Maybe one day there will be only copy and not reference in DB world, but I think the day has not come yet.