[Inteproxy-commits] r336 - in trunk: . inteproxy test

scm-commit@wald.intevation.org scm-commit at wald.intevation.org
Thu Jan 5 13:21:37 CET 2012


Author: teichmann
Date: 2012-01-05 13:21:36 +0100 (Thu, 05 Jan 2012)
New Revision: 336

Added:
   trunk/inteproxy/chunkedwriter.py
   trunk/test/test_chunkedwriter.py
Modified:
   trunk/
   trunk/ChangeLog
   trunk/inteproxy/proxycore.py
   trunk/inteproxy/transcoder.py
Log:
Merged revisions 322-333 via svnmerge from 
svn+ssh://teichmann@svn.wald.intevation.org/inteproxy/branches/streaming

........
  r322 | teichmann | 2011-12-23 10:50:40 +0100 (Fr, 23 Dez 2011) | 1 line
  
  Added a function to the transcoder to do url rewriting more than once.
........
  r323 | teichmann | 2011-12-23 17:43:28 +0100 (Fr, 23 Dez 2011) | 1 line
  
  Added writer for chunked transfer encoding.
........
  r324 | teichmann | 2011-12-23 18:16:09 +0100 (Fr, 23 Dez 2011) | 1 line
  
  Added method to transcode URLs while streaming data.
........
  r325 | teichmann | 2011-12-23 18:49:22 +0100 (Fr, 23 Dez 2011) | 1 line
  
  use rfind() instead of find() to boost the performance while using the streaming/rewriting mode.
........
  r326 | teichmann | 2011-12-23 18:56:06 +0100 (Fr, 23 Dez 2011) | 1 line
  
  Fixed indentation (c&p mistake)
........
  r327 | teichmann | 2011-12-25 18:32:10 +0100 (So, 25 Dez 2011) | 1 line
  
  Call new chunk/rewrite code if these conditions are met by the incoming response. Needs testing!
........
  r328 | teichmann | 2012-01-02 10:34:52 +0100 (Mo, 02 Jan 2012) | 1 line
  
  proxycore.py(wrap_read_write_debug) forget self
........
  r329 | teichmann | 2012-01-03 17:07:03 +0100 (Di, 03 Jan 2012) | 1 line
  
  Added doc strings to chunk writer.
........
  r330 | teichmann | 2012-01-03 17:19:50 +0100 (Di, 03 Jan 2012) | 1 line
  
  Added doc strings to transcoder.
........
  r331 | teichmann | 2012-01-03 17:29:56 +0100 (Di, 03 Jan 2012) | 1 line
  
  Added doc strings to proxycore
........
  r332 | teichmann | 2012-01-03 18:31:29 +0100 (Di, 03 Jan 2012) | 1 line
  
  Fixed Content-length bug if the content of a none chunked request is rewritten.
........
  r333 | teichmann | 2012-01-05 13:04:46 +0100 (Do, 05 Jan 2012) | 1 line
  
  Added unit tests for module inteproxy.chunkedwriter.
........



Property changes on: trunk
___________________________________________________________________
Name: svnmerge-integrated
   - /branches/streaming:1-321
   + /branches/streaming:1-335

Modified: trunk/ChangeLog
===================================================================
--- trunk/ChangeLog	2012-01-05 12:19:04 UTC (rev 335)
+++ trunk/ChangeLog	2012-01-05 12:21:36 UTC (rev 336)
@@ -1,3 +1,62 @@
+2012-01-05	Sascha L. Teichmann	<sascha.teichmann at intevation.de>
+
+	* test/test_chunkedwriter.py: New. Unit tests for the chunkedwriter
+	  module.
+
+2012-01-03	Sascha L. Teichmann	<sascha.teichmann at intevation.de>
+
+	* inteproxy/proxycore.py: Fixed: If a response is rewritten in
+	  the none chunk case the value of Content-Length changes.
+	  So this header can only be written out if the replacement
+	  is already done to determine the correct size.
+
+2012-01-03	Sascha L. Teichmann	<sascha.teichmann at intevation.de>
+
+	* inteproxy/chunkedwriter.py, inteproxy/transcoder.py,
+	  inteproxy/proxycore.py: Added doc strings.
+
+2012-01-02	Sascha L. Teichmann	<sascha.teichmann at intevation.de>
+
+	* inteproxy/proxycore.py(wrap_read_write_debug): Forgot self.
+
+2011-12-25	Sascha L. Teichmann	<sascha.teichmann at intevation.de>
+
+	* inteproxy/proxycore.py: Call the new rewrite/chunking code
+	  if the incoming resonse uses transfer encoding chunked
+	  and URL rewriting is active. Otherwise the old code path
+	  is used. Needs testing!
+
+2011-12-23	Sascha L. Teichmann	<sascha.teichmann at intevation.de>
+
+	* inteproxy/proxycore.py(ransfer_chunked_rewrite): Fixed 
+	  indentation problem (c&p mistake).
+
+2011-12-23	Sascha L. Teichmann	<sascha.teichmann at intevation.de>
+
+	* inteproxy/proxycore.py(transfer_chunked_rewrite): Use rfind()
+	  instead of find() to break input into lesser fragments. This 
+	  improves the performance a lot!
+
+2011-12-23	Sascha L. Teichmann	<sascha.teichmann at intevation.de>
+
+	* inteproxy/proxycore.py(transfer_chunked_rewrite): Added method
+	  to stream the data from incoming response to the output in chunks
+	  transcoding the URLs on the run. TODO: Integrate it.
+
+2011-12-23	Sascha L. Teichmann	<sascha.teichmann at intevation.de>
+
+	* inteproxy/chunkedwriter.py(ChunkedTransferEncodingWriter): New.
+	  Added class to write HTTP chunked transfer encoding. Useful
+	  if the input is given as short byte arrays to be aggregated
+	  into chunks of given a size which are streamed out.
+
+2011-12-23	Sascha L. Teichmann	<sascha.teichmann at intevation.de>
+
+	* inteproxy/transcoder.py: Refactored a bit. Introduced
+	  function url_rewriter which returns a function which 
+	  can be used to do url rewriting for a given string.
+	  Useful when rewriting is called more than once.
+
 2011-06-16  Bjoern Schilberg <bjoern.schilberg at intevation.de>
 
 	* M server/doc/source/gettingStarted.rstr:

Copied: trunk/inteproxy/chunkedwriter.py (from rev 333, branches/streaming/inteproxy/chunkedwriter.py)

Modified: trunk/inteproxy/proxycore.py
===================================================================
--- trunk/inteproxy/proxycore.py	2012-01-05 12:19:04 UTC (rev 335)
+++ trunk/inteproxy/proxycore.py	2012-01-05 12:21:36 UTC (rev 336)
@@ -30,6 +30,7 @@
 from inteproxy.httpmessage import HTTPRequestMessage, HTTPResponseMessage
 from inteproxy.httpconnection import connect_tcp, connect_http_connect, \
      connect_ssl, SocketHTTPConnection, parse_netloc
+from inteproxy.chunkedwriter import ChunkedTransferEncodingWriter
 
 
 # same as the BaseHTTPRequestHandler method, but as a standalone function:
@@ -104,7 +105,6 @@
             # check for fees and access constraints and run a dialog
             if self.server.show_terms_dialog:
                 handle_fees_and_access_constraints(remote_url, response)
-            self.rewrite_urls(response)
             self.handle_response(response)
 
         self.log_debug("request finished")
@@ -299,6 +299,12 @@
 
         return response_message
 
+    def send_headers(self, response):
+        """Write the HTTP headers to the output stream."""
+        for header, value in response.headers.items():
+            self.log_debug("header to client: %s:%r", header, value)
+            self.send_header(header, value)
+        self.end_headers()
 
     def handle_response(self, response):
         # The HTTP version in the reply generated by send_response is
@@ -309,26 +315,30 @@
         self.protocol_version = response.version
         self.send_response(response.status, response.reason)
 
-        for header, value in response.headers.items():
-            self.log_debug("header to client: %s:%r", header, value)
-            self.send_header(header, value)
-        self.end_headers()
+        do_rewrite = self.have_to_rewrite()
+        do_chunked = response.headers.get("Transfer-encoding") == "chunked"
 
-        transfer_encoding = response.headers.get("Transfer-encoding")
-        self.transfer_data(response.read, self.wfile.write,
-                           chunked = (transfer_encoding == "chunked"))
+        if do_chunked and do_rewrite:
+            self.send_headers(response)
+            self.transfer_data_rewrite_chunked(response)
+        else:
+            if do_rewrite:
+                self.rewrite_urls(response, do_rewrite)
+            self.send_headers(response)
+            self.transfer_data(response.read, self.wfile.write,
+                               chunked = do_chunked)
 
-    def transfer_data(self, read, write, length=None, chunked=False):
-        """Transfer data from one 'file' to another in chunks
-
-        The files are given by their read and write methods so it
-        doesn't have to be a file.  The read parameter must be callable
-        with an integer argument indicating the maximum number of bytes
-        to read and the write parameter must be callable with a string.
-        If the parameter chunked is true, the method uses the 'chunked'
-        transfer encoding when writing the data.
+    def transfer_data_rewrite_chunked(self, response):
+        """Transfers the incoming data of the origin server in chunks
+        and do url rewriting of the content at the same time.
         """
 
+        transcoder_map = self.server.transcoder_map
+        prefix = self.server.get_inteproxy_url()
+        rewrite = transcoder_map.url_rewriter(prefix, self.log_debug)
+        self.transfer_chunked_rewrite(rewrite, response.read, self.wfile.write)
+
+    def wrap_read_write_debug(self, read, write):
         # wrap the read/write functions if debug logging is active so
         # that the data read from the server and written to the client
         # is logged.
@@ -352,6 +362,61 @@
                 self.log_debug("to client: %r", limit_length(data))
                 orig_write(data)
 
+        return read, write
+
+    def transfer_chunked_rewrite(self, rewrite, read, write,
+                                 separator='>', length=4096):
+        """Transfers data from read() to write() in chunks. The
+        data is splitted by a given separator.
+        """
+
+        read, write = self.wrap_read_write_debug(read, write)
+        writer = ChunkedTransferEncodingWriter(write, length)
+        write = writer.append
+
+        data = []
+        append = data.append
+
+        while True:
+            chunk = read(length)
+            if not chunk:
+                break
+
+            pos = 0
+            while True:
+                idx = chunk.rfind(separator, pos)
+
+                if idx > 0:
+                    rest = chunk[pos:idx]
+                    append(rest)
+                    rewritten = rewrite(''.join(data))
+                    del data[:]
+                    append(separator)
+                    write(rewritten)
+                    rewritten = None
+                    pos = idx+1
+                else:
+                    append(chunk[pos:] if pos else chunk)
+                    break
+
+        rewritten = rewrite(''.join(data))
+        write(rewritten)
+        rewritten = None
+        writer.finish()
+
+    def transfer_data(self, read, write, length=None, chunked=False):
+        """Transfer data from one 'file' to another in chunks
+
+        The files are given by their read and write methods so it
+        doesn't have to be a file.  The read parameter must be callable
+        with an integer argument indicating the maximum number of bytes
+        to read and the write parameter must be callable with a string.
+        If the parameter chunked is true, the method uses the 'chunked'
+        transfer encoding when writing the data.
+        """
+
+        read, write = self.wrap_read_write_debug(read, write)
+
         # Now transfer the data in blocks of max_chunk_size
         max_chunk_size = 4096
         while 1:
@@ -372,7 +437,12 @@
         if chunked:
             write("0\r\n\r\n")
 
-    def rewrite_urls(self, response):
+    def have_to_rewrite(self):
+        """Returns if url rewriting is necessary."""
+        return self.server.rewrite_urls and not urlparse.urlsplit(self.path)[0]
+
+
+    def rewrite_urls(self, response, force_rewrite=False):
         """Rewrites URLs in the response if enabled in the server
 
         This method rewrites URLs in the response if the request is a
@@ -381,7 +451,7 @@
         the server.  The actual rewriting is done by the server's
         transcoder_map.
         """
-        if not urlparse.urlsplit(self.path)[0] and self.server.rewrite_urls:
+        if force_rewrite or self.have_to_rewrite():
             transcoder_map = self.server.transcoder_map
             prefix = self.server.get_inteproxy_url()
             response.body = transcoder_map.rewrite_urls(response.body, prefix,

Modified: trunk/inteproxy/transcoder.py
===================================================================
--- trunk/inteproxy/transcoder.py	2012-01-05 12:19:04 UTC (rev 335)
+++ trunk/inteproxy/transcoder.py	2012-01-05 12:21:36 UTC (rev 336)
@@ -284,6 +284,41 @@
                                 (scheme, netloc, path, query, fragment),
                                 rule)
 
+    def build_url_patterns(self):
+        """Builds a list of regex patterns using the 
+        hosts and pathes from the rules.
+        """
+
+        return ["%s%s" % (host_regex.pattern, path_regex.pattern)
+                for host_regex, path_regex, classname in self.rules]
+
+    def build_url_regex(self):
+        """Glues together the single url patterns into a large regex
+        prefixed by http:// or https://.
+        """
+
+        return ("(?:http|https)://(?:%s)" %
+                "|".join("(" + pattern + ")"
+                         for pattern in self.build_url_patterns()))
+
+    def url_rewriter(self, prefix, log_debug):
+        """Compiles the url regexes to be used more than once.
+        Returns a funtion which takes as a single argument the
+        byte data to apply the url rewriting on. This function
+        returns the rewritten data.
+        """
+
+        pattern = re.compile(self.build_url_regex())
+
+        def make_inteprox_url(match):
+            url = match.group(match.lastindex)
+            return prefix + url
+
+        def rewrite(data):
+            return pattern.sub(make_inteprox_url, data)
+
+        return rewrite
+
     def rewrite_urls(self, data, prefix, log_debug):
         """Prefix all known URLs in data with prefix.
 
@@ -301,15 +336,9 @@
         messages, usually the log_debug method of the
         InteProxyHTTPRequestHandler.
         """
-        url_patterns = []
-        for host_regex, path_regex, classname in self.rules:
-            url_patterns.append("%s%s"
-                                % (host_regex.pattern, path_regex.pattern))
-        regex = ("(?:http|https)://(?:"
-                 + "|".join("(" + pattern + ")"
-                            for pattern in url_patterns)
-                 + ")")
 
+        regex = self.build_url_regex()
+
         def make_inteprox_url(match):
             url = match.group(match.lastindex)
             return prefix + url

Copied: trunk/test/test_chunkedwriter.py (from rev 333, branches/streaming/test/test_chunkedwriter.py)



More information about the Inteproxy-commits mailing list