[Inteproxy-commits] r359 - in trunk: . inteproxy test

scm-commit at wald.intevation.org scm-commit at wald.intevation.org
Tue Mar 6 18:18:24 CET 2012


Author: aheinecke
Date: 2012-03-06 18:18:24 +0100 (Tue, 06 Mar 2012)
New Revision: 359

Added:
   trunk/inteproxy/decompressstream.py
   trunk/test/test_decompressstream.py
Modified:
   trunk/
   trunk/ChangeLog
   trunk/inteproxy/httpmessage.py
   trunk/inteproxy/proxycore.py
   trunk/test/test_inteproxy.py
Log:
Merged revisions 347-357 via svnmerge from 
svn+ssh://aheinecke@svn.wald.intevation.org/inteproxy/branches/compression

........
  r347 | aheinecke | 2012-02-22 17:06:21 +0000 (Wed, 22 Feb 2012) | 8 lines
  
  Add support for gzip/deflate compression in HTTPMessage
  
  The read method now uncompresses the data if necessary and always
  returns the uncompressed value. Both compression methods use
  a zlib decompression object with a different Window Size.
  This also allows decompressing chunks of the stream without
  loading the complete stream into memory.
........
  r348 | aheinecke | 2012-02-22 17:08:53 +0000 (Wed, 22 Feb 2012) | 2 lines
  
  Add Accept-Encoding headers to http requests
........
  r349 | aheinecke | 2012-02-22 17:58:14 +0000 (Wed, 22 Feb 2012) | 3 lines
  
  Break lines at 80 characters in the docstring and
  change indention a bit.
........
  r350 | aheinecke | 2012-02-23 15:22:26 +0000 (Thu, 23 Feb 2012) | 4 lines
  
  Add TestInteProxyCompressedConnection to test content-encodings
  
  A test for compression on a chunked connection is missing, yet
........
  r351 | aheinecke | 2012-02-23 15:26:28 +0000 (Thu, 23 Feb 2012) | 5 lines
  
  Check for the need to decompress a response on initialization
  of the response.
  This allows to correctly change the headers (Content-Encoding/
  Content-Length) before forwarding the response.
........
  r352 | aheinecke | 2012-02-23 17:53:21 +0000 (Thu, 23 Feb 2012) | 10 lines
  
  Move compression logic out of the httpmessage classes.
  The decompression will still be transparent for the transfer_data
  functions but no longer be handled by the Httpmessage class.
  
  Also the accept-encoding header is no longer overwritten and is
  only added if the client did not request gzip or deflate.
  If the request already contained accept-encoding and no rewrite
  is neccessary the response will stay encoded.
........
  r353 | aheinecke | 2012-02-24 09:33:54 +0000 (Fri, 24 Feb 2012) | 10 lines
  
  Add exception handling to compression, return raw data
  if it can not be decompressed.
  Change decompressed read to allow chained read wrappers.
  
  Add test to TestInteProxyCompressedConnection for handling
  an invalid compressed response.
  
  Improve comments.
........
  r354 | aheinecke | 2012-02-24 16:04:28 +0000 (Fri, 24 Feb 2012) | 12 lines
  
  * M inteproxy/proxycore.py:
    Add method decompressed_read to read decompressed
    data from a compressed response and use it where
    data from a httpresponse is read.
  * M test/test_inteproxy.py:
    Disable test in TestInteProxyCompressedConnection for handling
    an invalid compressed response, as InteProxy will now crash in
    that case.
  * M inteproxy/httpmessage.py:
    Add decompressor parameter to read_entire_message to decompress
    the body with it.
........
  r355 | aheinecke | 2012-02-29 09:57:31 +0000 (Wed, 29 Feb 2012) | 11 lines
  
  Refactor compression, HTTPMessage now reads from an
  internal _body_stream which is a DecompressStream
  in the case that do_decompress is called before reading.
  
  do_decompress also now modifies the header to remove
  the Content-Encoding and lenght.
  
  The body stream in httpmessage is now also set
  when the body is set.
........
  r356 | aheinecke | 2012-03-02 10:09:39 +0000 (Fri, 02 Mar 2012) | 2 lines
  
  Fix indention
........
  r357 | aheinecke | 2012-03-02 10:23:30 +0000 (Fri, 02 Mar 2012) | 8 lines
  
  Fix Bug in decompressstream reading that caused the
  read not to read everything correctly when a negative
  parameter was passed after some parts had already been
  read.
  Added unit test for decompressstream reading.
  
  Some cleanup in proxycore
........



Property changes on: trunk
___________________________________________________________________
Modified: svnmerge-integrated
   - /branches/compression:1-346
   + /branches/compression:1-358

Modified: trunk/ChangeLog
===================================================================
--- trunk/ChangeLog	2012-03-06 17:17:41 UTC (rev 358)
+++ trunk/ChangeLog	2012-03-06 17:18:24 UTC (rev 359)
@@ -1,3 +1,83 @@
+2012-03-02	Andre Heinecke	<aheinecke at intevation.de>
+	* A test/decompressstream.py:
+	  Added test for decompressed reading
+	* M inteproxy/decompressstream.py:
+	  Fix reading of the complete stream after starting to read
+	  small chunks.
+	* M inteproxy/proxycore.py:
+	  Remove the response parameter again for the transfer_data
+	  functions.
+
+2012-02-29	Andre Heinecke	<aheinecke at intevation.de>
+	* M inteproxy/proxycore.py:
+	  Remove decompressed_read method.
+	  Remove method get_decompress_object
+	* M inteproxy/httpmessage.py:
+	  Added do_decompress method to httpresponse to select decompression 
+	  of the response.
+	  Read does now create a decompression object if necessary to decompress
+	  the input stream of the response.
+	* A inteproxy/decompressstream.py:
+	  Add decompress stream class to wrap around an input stream for
+	  decompressed reading.
+
+2012-02-24	Andre Heinecke	<aheinecke at intevation.de>
+	* M inteproxy/proxycore.py:
+	  Add method decompressed_read to read decompressed
+	  data from a compressed response and use it where
+	  data from a httpresponse is read.
+	* M test/test_inteproxy.py:
+	  Disable test in TestInteProxyCompressedConnection for handling
+	  an invalid compressed response, as InteProxy will now crash in
+	  that case.
+	* M inteproxy/httpmessage.py:
+	  Add decompressor parameter to read_entire_message to decompress
+	  the body with it.
+
+2012-02-24	Andre Heinecke	<aheinecke at intevation.de>
+
+	* M inteproxy/proxycore.py:
+	  Add exception handling to compression, return raw data
+	  if it can not be decompressed.
+	  Change decompressed read to allow chained read wrappers
+	* M test/test_inteproxy.py:
+	  Add test to TestInteProxyCompressedConnection for handling
+	  an invalid compressed response
+
+2012-02-23	Andre Heinecke	<aheinecke at intevation.de>
+
+	* M inteproxy/httpmessage.py:
+	  Remove compression handling but still use the read function
+	  of HTTPMessage for reading the entire body.
+	* M inteproxy/proxycore.py:
+	  Move compression handling into the proxycore. Add method
+	  get_decompress_object to select the correct decompression
+	  algorithm and Only do decompression if the client has not
+	  requested a compressed response or if we need to rewrite urls.
+
+2012-02-23	Andre Heinecke	<aheinecke at intevation.de>
+
+	* M inteproxy/httpmessage.py:
+	  Only decompress responses, remove the Content
+	  Encoding header, decide compression on inititalization
+	  of a httpresponse.
+
+2012-02-23	Andre Heinecke	<aheinecke at intevation.de>
+
+	* M test/test_inteproxy.py:
+	  Add TestInteProxyCompressedConnection to test
+	  messages with content-encoding deflate and gzip
+
+2012-02-22	Andre Heinecke	<aheinecke at intevation.de>
+
+	* M inteproxy/proxycore.py:
+	  Add Accept-Encoding header to http_request
+
+2012-02-22	Andre Heinecke	<aheinecke at intevation.de>
+
+	* M inteproxy/httpmessage.py:
+	  Added support for decompressing HTTPMessages on read
+
 2012-01-21	Bjoern Schilberg	<bjoern.schilberg at intevation.de>
 
 	* M setup.py:

Copied: trunk/inteproxy/decompressstream.py (from rev 357, branches/compression/inteproxy/decompressstream.py)
===================================================================
--- trunk/inteproxy/decompressstream.py	                        (rev 0)
+++ trunk/inteproxy/decompressstream.py	2012-03-06 17:18:24 UTC (rev 359)
@@ -0,0 +1,51 @@
+# Copyright (C) 2012 by Intevation GmbH
+# Authors:
+# Bernhard Herzog <bh at intevation.de>
+# Andre Heinecke <aheinecke at intevation.de>
+#
+# This program is free software under the GPL (>=v2)
+# Read the file COPYING coming with the software for details.
+
+""" On the fly decompression of a data stream """
+
+class DecompressStream(object):
+    """A class to wrap around a data stream that contains compressed data.
+
+    The decompression object can be given on initalization.
+    """
+
+    def __init__(self, infile, decompressobj):
+        """Initialize the DecompressStream Object
+
+        The parameter infile is used as input stream
+        and the parameter decompressobj to provide the decompression.
+        """
+        self.infile = infile
+        self.decompressor = decompressobj
+
+    def read(self, amount = -1):
+        """Decompressed the stream and returns the uncompressed data.
+
+        A negative parameter for amount indicates that the complete
+        stream should be decompressed.
+        """
+        decompressed_chunks = []
+        count = 0
+
+        if amount < 0:
+            compressed = self.decompressor.unconsumed_tail
+            compressed += self.infile.read()
+            return self.decompressor.decompress(compressed)
+
+        while count < amount:
+            max_read = amount - count
+            compressed = self.decompressor.unconsumed_tail
+            if not compressed:
+                compressed = self.infile.read(amount)
+                if not compressed:
+                    break
+            deflated = self.decompressor.decompress(compressed, max_read)
+            count += len(deflated)
+            decompressed_chunks.append(deflated)
+
+        return "".join(decompressed_chunks)

Modified: trunk/inteproxy/httpmessage.py
===================================================================
--- trunk/inteproxy/httpmessage.py	2012-03-06 17:17:41 UTC (rev 358)
+++ trunk/inteproxy/httpmessage.py	2012-03-06 17:18:24 UTC (rev 359)
@@ -8,6 +8,8 @@
 """Abstractions for http request and response messages"""
 
 from StringIO import StringIO
+from inteproxy.decompressstream import DecompressStream
+from zlib import decompressobj, MAX_WBITS
 
 class HTTPMessage(object):
 
@@ -66,6 +68,7 @@
         if content_type is not None:
             self.headers["Content-type"] = content_type
         self._body = body
+        self._body_stream = StringIO(self.body)
 
     def get_body(self):
         self.read_entire_message()
@@ -77,12 +80,11 @@
         raise NotImplementedError
 
     def read(self, amount):
-        if self._body_stream is None and self.body_has_been_read():
-            self._body_stream = StringIO(self.body)
         if self._body_stream is not None:
-            return self._body_stream.read(amount)
+            data = self._body_stream.read(amount)
         else:
-            return self.infile.read(amount)
+            data = self.infile.read(amount)
+        return data
 
 
 class HTTPRequestMessage(HTTPMessage):
@@ -144,6 +146,8 @@
         self.version = version
         self.status = status
         self.reason = reason
+        self.__decompress = False
+        self.started_reading = False
 
     def debug_log_message(self, log_function):
         log_function("HTTPResponseMessage: %s %s %s",
@@ -151,6 +155,78 @@
         super(HTTPResponseMessage, self).debug_log_message(log_function)
 
     def read_entire_message(self):
-        if self.body_has_been_read():
+        """ Read the entire message and set the messages body.
+
+        If the optional decompressor parameter is given the
+        body will be decompressed.
+        """
+        if not self.body_has_been_read():
+            self.set_body(self.read())
+
+    def do_decompress(self):
+        """
+        Decompress the input stream on read if possible.
+
+        If the content-encoding of the message is either gzip
+        or deflate the Content-Encoding and Content-Length
+        headers will also be removed.
+        """
+        if self.__decompress:
             return
-        self.set_body(self.infile.read())
+
+        if not self.headers.get("Content-Encoding"):
+            return
+
+        if self.headers.get("Content-Encoding") == "deflate":
+            self.__decompress = "deflate"
+        elif self.headers.get("Content-Encoding") == "gzip":
+            self.__decompress = "gzip"
+
+        if self.__decompress:
+            # Can decompress the input
+            if self.started_reading:
+                self.__decompress = False
+                raise Exception("do_decompress called after first read")
+
+            del self.headers["Content-Encoding"]
+            if self.headers.get("Content-Length"):
+                del self.headers["Content-Length"]
+
+    def read(self, amount = -1):
+        """
+        Read the message up to amount bytes and return a data string of
+        length amount.
+        If do_decompress was called before this will return the data in
+        a decompressed form if possible.
+        """
+
+        if self.started_reading == False and amount != 0:
+            self.started_reading = True
+
+        if self._body_stream is None:
+            if self.__decompress:
+                # Create the decompression object
+                #
+                # On defate decompression -zlib.MAX_WBITS is given to ensure
+                # that non RFC confirming responses as they are sent by most
+                # http servers are decompressed correctly by ignoring a
+                # possibly invlaid header.
+
+                # To decompress gzip with zlib 16 needs to be added to the
+                # wbits parameter
+
+                # See the documention of inflateInit2 at
+                # http://zlib.net/manual.html
+
+                if self.__decompress == "deflate":
+                    self._body_stream = DecompressStream(self.infile,
+                            decompressobj(-MAX_WBITS))
+                elif self.__decompress == "gzip":
+                    self._body_stream = DecompressStream(self.infile,
+                            decompressobj(16 + MAX_WBITS))
+                else:
+                    raise Exception("Invalid decompress method"
+                                    " in HTTPResponse")
+            else:
+                self._body_stream = self.infile
+        return self._body_stream.read(amount)

Modified: trunk/inteproxy/proxycore.py
===================================================================
--- trunk/inteproxy/proxycore.py	2012-03-06 17:17:41 UTC (rev 358)
+++ trunk/inteproxy/proxycore.py	2012-03-06 17:18:24 UTC (rev 359)
@@ -75,6 +75,11 @@
         client_request = self.read_client_request()
 
         #
+        # Make sure that it requests compressed data
+        #
+        self.ensure_encoding_header(client_request)
+
+        #
         # Determine the transcoder to use
         #
         transcoder = self.server.transcoder_map.get_transcoder(self.command,
@@ -222,6 +227,7 @@
 
         extra_headers = [("Host", "%s:%d" % remote_address)]
 
+
         sock = None
 
         if scheme == "http":
@@ -299,6 +305,23 @@
 
         return response_message
 
+    def ensure_encoding_header(self, client_request):
+        """
+        Request compression even if the client did not.
+        This will modify the headers of client_request.
+        """
+        if not client_request.headers.get("Accept-Encoding"):
+            client_request.headers["Accept-Encoding"] = "gzip, deflate"
+            self.should_decompress_response = True
+        elif ( not "gzip" in client_request.headers["Accept-Encoding"] and
+               not "deflate" in client_request.headers["Accept-Encoding"] ):
+            client_request.headers["Accept-Encoding"] = \
+                ", ".join([client_request.headers["Accept-Encoding"],
+                    "gzip", "deflate"])
+            self.should_decompress_response = True
+        else:
+            self.should_decompress_response = False
+
     def send_headers(self, response):
         """Write the HTTP headers to the output stream."""
         for header, value in response.headers.items():
@@ -318,6 +341,9 @@
         do_rewrite = self.have_to_rewrite()
         do_chunked = response.headers.get("Transfer-encoding") == "chunked"
 
+        if do_rewrite or self.should_decompress_response:
+            response.do_decompress()
+
         if do_chunked and do_rewrite:
             self.send_headers(response)
             self.transfer_data_rewrite_chunked(response)

Copied: trunk/test/test_decompressstream.py (from rev 357, branches/compression/test/test_decompressstream.py)
===================================================================
--- trunk/test/test_decompressstream.py	                        (rev 0)
+++ trunk/test/test_decompressstream.py	2012-03-06 17:18:24 UTC (rev 359)
@@ -0,0 +1,44 @@
+# Copyright (C) 2012 by Intevation GmbH
+# Authors:
+# Andre Heinecke <aheinecke at intevation.de>
+#
+# This program is free software under the GPL (>=v2)
+# Read the file COPYING coming with the software for details
+
+"""Tests for the inteproxy.decompressstream module"""
+
+import unittest
+import StringIO
+import zlib
+
+from random import Random
+
+from inteproxy.decompressstream import DecompressStream
+
+class DecompressStreamTest(unittest.TestCase):
+
+    def test_read_amounts(self):
+        """Test for the ChunkedTransferEncodingWriter"""
+
+        DATA = "It's still magic even if you know how it's done."
+
+        compressed_data = zlib.compress(DATA)
+        compressed_stream = StringIO.StringIO(compressed_data)
+        decompressobj = zlib.decompressobj()
+
+        dstream = DecompressStream(compressed_stream, decompressobj)
+
+        # Test reading small "bites"
+        result = dstream.read(5)
+        self.assertEqual(result, DATA[:5])
+        result2 = dstream.read(1)
+        self.assertEqual(result2, DATA[5])
+        result3 = dstream.read(-123)
+        self.assertEqual(result + result2 + result3, DATA)
+
+        # Test reading everything
+        compressed_stream = StringIO.StringIO(compressed_data)
+        decompressobj = zlib.decompressobj()
+
+        dstream = DecompressStream(compressed_stream, decompressobj)
+        self.assertEqual(dstream.read(), DATA)

Modified: trunk/test/test_inteproxy.py
===================================================================
--- trunk/test/test_inteproxy.py	2012-03-06 17:17:41 UTC (rev 358)
+++ trunk/test/test_inteproxy.py	2012-03-06 17:18:24 UTC (rev 359)
@@ -521,3 +521,48 @@
             self.assertEquals(response.status, 200)
             data = response.read()
             self.assertEquals(data, "some text")
+
+class TestInteProxyCompressedConnection(ServerTest):
+    remote_contents = [
+        ("/plain", [("Content-Type", "text/plain")], "not encoded"),
+        ("/gzip",  [("Content-Type", "text/plain"),
+            ("Content-Encoding", "gzip")],
+            base64.b64decode("H4sICNwRRk8AA2Zvby50eHQAS8vP5wIAqGUyfgQAAAA=")),
+        ("/deflate", [("Content-Type", "text/plain"),
+            ("Content-Encoding", "deflate")],
+            base64.b64decode("S8vPBwA="))]
+       # ("/invalid", [("Content-Type", "text/plain"),
+       #     ("Content-Encoding", "deflate")], "foo")]
+
+
+    def test_plain(self):
+        http = httplib.HTTPConnection("localhost", self.server.server_port)
+        http.request("GET", self.remote_server_base_url + "plain")
+        response = http.getresponse()
+        self.assertEquals(response.status, 200)
+        data = response.read()
+        self.assertEquals(data, "not encoded")
+
+    def test_deflate(self):
+        http = httplib.HTTPConnection("localhost", self.server.server_port)
+        http.request("GET", self.remote_server_base_url + "deflate")
+        response = http.getresponse()
+        self.assertEquals(response.status, 200)
+        data = response.read()
+        self.assertEquals(data, "foo")
+
+    def test_gzip(self):
+        http = httplib.HTTPConnection("localhost", self.server.server_port)
+        http.request("GET", self.remote_server_base_url + "gzip")
+        response = http.getresponse()
+        self.assertEquals(response.status, 200)
+        data = response.read()
+        self.assertEquals(data, "foo\n")
+
+    #def test_invalid_data(self):
+    #    http = httplib.HTTPConnection("localhost", self.server.server_port)
+    #    http.request("GET", self.remote_server_base_url + "invalid")
+    #    response = http.getresponse()
+    #    self.assertEquals(response.status, 200)
+    #    data = response.read()
+    #    self.assertEquals(data, "foo")



More information about the Inteproxy-commits mailing list