[Inteproxy-commits] r336 - in trunk: . inteproxy test
scm-commit@wald.intevation.org
scm-commit at wald.intevation.org
Thu Jan 5 13:21:37 CET 2012
Author: teichmann
Date: 2012-01-05 13:21:36 +0100 (Thu, 05 Jan 2012)
New Revision: 336
Added:
trunk/inteproxy/chunkedwriter.py
trunk/test/test_chunkedwriter.py
Modified:
trunk/
trunk/ChangeLog
trunk/inteproxy/proxycore.py
trunk/inteproxy/transcoder.py
Log:
Merged revisions 322-333 via svnmerge from
svn+ssh://teichmann@svn.wald.intevation.org/inteproxy/branches/streaming
........
r322 | teichmann | 2011-12-23 10:50:40 +0100 (Fr, 23 Dez 2011) | 1 line
Added a function to the transcoder to do url rewriting more than once.
........
r323 | teichmann | 2011-12-23 17:43:28 +0100 (Fr, 23 Dez 2011) | 1 line
Added writer for chunked transfer encoding.
........
r324 | teichmann | 2011-12-23 18:16:09 +0100 (Fr, 23 Dez 2011) | 1 line
Added method to transcode URLs while streaming data.
........
r325 | teichmann | 2011-12-23 18:49:22 +0100 (Fr, 23 Dez 2011) | 1 line
use rfind() instead of find() to boost the performance while using the streaming/rewriting mode.
........
r326 | teichmann | 2011-12-23 18:56:06 +0100 (Fr, 23 Dez 2011) | 1 line
Fixed indentation (c&p mistake)
........
r327 | teichmann | 2011-12-25 18:32:10 +0100 (So, 25 Dez 2011) | 1 line
Call new chunk/rewrite code if these conditions are met by the incoming response. Needs testing!
........
r328 | teichmann | 2012-01-02 10:34:52 +0100 (Mo, 02 Jan 2012) | 1 line
proxycore.py(wrap_read_write_debug) forget self
........
r329 | teichmann | 2012-01-03 17:07:03 +0100 (Di, 03 Jan 2012) | 1 line
Added doc strings to chunk writer.
........
r330 | teichmann | 2012-01-03 17:19:50 +0100 (Di, 03 Jan 2012) | 1 line
Added doc strings to transcoder.
........
r331 | teichmann | 2012-01-03 17:29:56 +0100 (Di, 03 Jan 2012) | 1 line
Added doc strings to proxycore
........
r332 | teichmann | 2012-01-03 18:31:29 +0100 (Di, 03 Jan 2012) | 1 line
Fixed Content-length bug if the content of a none chunked request is rewritten.
........
r333 | teichmann | 2012-01-05 13:04:46 +0100 (Do, 05 Jan 2012) | 1 line
Added unit tests for module inteproxy.chunkedwriter.
........
Property changes on: trunk
___________________________________________________________________
Name: svnmerge-integrated
- /branches/streaming:1-321
+ /branches/streaming:1-335
Modified: trunk/ChangeLog
===================================================================
--- trunk/ChangeLog 2012-01-05 12:19:04 UTC (rev 335)
+++ trunk/ChangeLog 2012-01-05 12:21:36 UTC (rev 336)
@@ -1,3 +1,62 @@
+2012-01-05 Sascha L. Teichmann <sascha.teichmann at intevation.de>
+
+ * test/test_chunkedwriter.py: New. Unit tests for the chunkedwriter
+ module.
+
+2012-01-03 Sascha L. Teichmann <sascha.teichmann at intevation.de>
+
+ * inteproxy/proxycore.py: Fixed: If a response is rewritten in
+ the none chunk case the value of Content-Length changes.
+ So this header can only be written out if the replacement
+ is already done to determine the correct size.
+
+2012-01-03 Sascha L. Teichmann <sascha.teichmann at intevation.de>
+
+ * inteproxy/chunkedwriter.py, inteproxy/transcoder.py,
+ inteproxy/proxycore.py: Added doc strings.
+
+2012-01-02 Sascha L. Teichmann <sascha.teichmann at intevation.de>
+
+ * inteproxy/proxycore.py(wrap_read_write_debug): Forgot self.
+
+2011-12-25 Sascha L. Teichmann <sascha.teichmann at intevation.de>
+
+ * inteproxy/proxycore.py: Call the new rewrite/chunking code
+ if the incoming resonse uses transfer encoding chunked
+ and URL rewriting is active. Otherwise the old code path
+ is used. Needs testing!
+
+2011-12-23 Sascha L. Teichmann <sascha.teichmann at intevation.de>
+
+ * inteproxy/proxycore.py(ransfer_chunked_rewrite): Fixed
+ indentation problem (c&p mistake).
+
+2011-12-23 Sascha L. Teichmann <sascha.teichmann at intevation.de>
+
+ * inteproxy/proxycore.py(transfer_chunked_rewrite): Use rfind()
+ instead of find() to break input into lesser fragments. This
+ improves the performance a lot!
+
+2011-12-23 Sascha L. Teichmann <sascha.teichmann at intevation.de>
+
+ * inteproxy/proxycore.py(transfer_chunked_rewrite): Added method
+ to stream the data from incoming response to the output in chunks
+ transcoding the URLs on the run. TODO: Integrate it.
+
+2011-12-23 Sascha L. Teichmann <sascha.teichmann at intevation.de>
+
+ * inteproxy/chunkedwriter.py(ChunkedTransferEncodingWriter): New.
+ Added class to write HTTP chunked transfer encoding. Useful
+ if the input is given as short byte arrays to be aggregated
+ into chunks of given a size which are streamed out.
+
+2011-12-23 Sascha L. Teichmann <sascha.teichmann at intevation.de>
+
+ * inteproxy/transcoder.py: Refactored a bit. Introduced
+ function url_rewriter which returns a function which
+ can be used to do url rewriting for a given string.
+ Useful when rewriting is called more than once.
+
2011-06-16 Bjoern Schilberg <bjoern.schilberg at intevation.de>
* M server/doc/source/gettingStarted.rstr:
Copied: trunk/inteproxy/chunkedwriter.py (from rev 333, branches/streaming/inteproxy/chunkedwriter.py)
Modified: trunk/inteproxy/proxycore.py
===================================================================
--- trunk/inteproxy/proxycore.py 2012-01-05 12:19:04 UTC (rev 335)
+++ trunk/inteproxy/proxycore.py 2012-01-05 12:21:36 UTC (rev 336)
@@ -30,6 +30,7 @@
from inteproxy.httpmessage import HTTPRequestMessage, HTTPResponseMessage
from inteproxy.httpconnection import connect_tcp, connect_http_connect, \
connect_ssl, SocketHTTPConnection, parse_netloc
+from inteproxy.chunkedwriter import ChunkedTransferEncodingWriter
# same as the BaseHTTPRequestHandler method, but as a standalone function:
@@ -104,7 +105,6 @@
# check for fees and access constraints and run a dialog
if self.server.show_terms_dialog:
handle_fees_and_access_constraints(remote_url, response)
- self.rewrite_urls(response)
self.handle_response(response)
self.log_debug("request finished")
@@ -299,6 +299,12 @@
return response_message
+ def send_headers(self, response):
+ """Write the HTTP headers to the output stream."""
+ for header, value in response.headers.items():
+ self.log_debug("header to client: %s:%r", header, value)
+ self.send_header(header, value)
+ self.end_headers()
def handle_response(self, response):
# The HTTP version in the reply generated by send_response is
@@ -309,26 +315,30 @@
self.protocol_version = response.version
self.send_response(response.status, response.reason)
- for header, value in response.headers.items():
- self.log_debug("header to client: %s:%r", header, value)
- self.send_header(header, value)
- self.end_headers()
+ do_rewrite = self.have_to_rewrite()
+ do_chunked = response.headers.get("Transfer-encoding") == "chunked"
- transfer_encoding = response.headers.get("Transfer-encoding")
- self.transfer_data(response.read, self.wfile.write,
- chunked = (transfer_encoding == "chunked"))
+ if do_chunked and do_rewrite:
+ self.send_headers(response)
+ self.transfer_data_rewrite_chunked(response)
+ else:
+ if do_rewrite:
+ self.rewrite_urls(response, do_rewrite)
+ self.send_headers(response)
+ self.transfer_data(response.read, self.wfile.write,
+ chunked = do_chunked)
- def transfer_data(self, read, write, length=None, chunked=False):
- """Transfer data from one 'file' to another in chunks
-
- The files are given by their read and write methods so it
- doesn't have to be a file. The read parameter must be callable
- with an integer argument indicating the maximum number of bytes
- to read and the write parameter must be callable with a string.
- If the parameter chunked is true, the method uses the 'chunked'
- transfer encoding when writing the data.
+ def transfer_data_rewrite_chunked(self, response):
+ """Transfers the incoming data of the origin server in chunks
+ and do url rewriting of the content at the same time.
"""
+ transcoder_map = self.server.transcoder_map
+ prefix = self.server.get_inteproxy_url()
+ rewrite = transcoder_map.url_rewriter(prefix, self.log_debug)
+ self.transfer_chunked_rewrite(rewrite, response.read, self.wfile.write)
+
+ def wrap_read_write_debug(self, read, write):
# wrap the read/write functions if debug logging is active so
# that the data read from the server and written to the client
# is logged.
@@ -352,6 +362,61 @@
self.log_debug("to client: %r", limit_length(data))
orig_write(data)
+ return read, write
+
+ def transfer_chunked_rewrite(self, rewrite, read, write,
+ separator='>', length=4096):
+ """Transfers data from read() to write() in chunks. The
+ data is splitted by a given separator.
+ """
+
+ read, write = self.wrap_read_write_debug(read, write)
+ writer = ChunkedTransferEncodingWriter(write, length)
+ write = writer.append
+
+ data = []
+ append = data.append
+
+ while True:
+ chunk = read(length)
+ if not chunk:
+ break
+
+ pos = 0
+ while True:
+ idx = chunk.rfind(separator, pos)
+
+ if idx > 0:
+ rest = chunk[pos:idx]
+ append(rest)
+ rewritten = rewrite(''.join(data))
+ del data[:]
+ append(separator)
+ write(rewritten)
+ rewritten = None
+ pos = idx+1
+ else:
+ append(chunk[pos:] if pos else chunk)
+ break
+
+ rewritten = rewrite(''.join(data))
+ write(rewritten)
+ rewritten = None
+ writer.finish()
+
+ def transfer_data(self, read, write, length=None, chunked=False):
+ """Transfer data from one 'file' to another in chunks
+
+ The files are given by their read and write methods so it
+ doesn't have to be a file. The read parameter must be callable
+ with an integer argument indicating the maximum number of bytes
+ to read and the write parameter must be callable with a string.
+ If the parameter chunked is true, the method uses the 'chunked'
+ transfer encoding when writing the data.
+ """
+
+ read, write = self.wrap_read_write_debug(read, write)
+
# Now transfer the data in blocks of max_chunk_size
max_chunk_size = 4096
while 1:
@@ -372,7 +437,12 @@
if chunked:
write("0\r\n\r\n")
- def rewrite_urls(self, response):
+ def have_to_rewrite(self):
+ """Returns if url rewriting is necessary."""
+ return self.server.rewrite_urls and not urlparse.urlsplit(self.path)[0]
+
+
+ def rewrite_urls(self, response, force_rewrite=False):
"""Rewrites URLs in the response if enabled in the server
This method rewrites URLs in the response if the request is a
@@ -381,7 +451,7 @@
the server. The actual rewriting is done by the server's
transcoder_map.
"""
- if not urlparse.urlsplit(self.path)[0] and self.server.rewrite_urls:
+ if force_rewrite or self.have_to_rewrite():
transcoder_map = self.server.transcoder_map
prefix = self.server.get_inteproxy_url()
response.body = transcoder_map.rewrite_urls(response.body, prefix,
Modified: trunk/inteproxy/transcoder.py
===================================================================
--- trunk/inteproxy/transcoder.py 2012-01-05 12:19:04 UTC (rev 335)
+++ trunk/inteproxy/transcoder.py 2012-01-05 12:21:36 UTC (rev 336)
@@ -284,6 +284,41 @@
(scheme, netloc, path, query, fragment),
rule)
+ def build_url_patterns(self):
+ """Builds a list of regex patterns using the
+ hosts and pathes from the rules.
+ """
+
+ return ["%s%s" % (host_regex.pattern, path_regex.pattern)
+ for host_regex, path_regex, classname in self.rules]
+
+ def build_url_regex(self):
+ """Glues together the single url patterns into a large regex
+ prefixed by http:// or https://.
+ """
+
+ return ("(?:http|https)://(?:%s)" %
+ "|".join("(" + pattern + ")"
+ for pattern in self.build_url_patterns()))
+
+ def url_rewriter(self, prefix, log_debug):
+ """Compiles the url regexes to be used more than once.
+ Returns a funtion which takes as a single argument the
+ byte data to apply the url rewriting on. This function
+ returns the rewritten data.
+ """
+
+ pattern = re.compile(self.build_url_regex())
+
+ def make_inteprox_url(match):
+ url = match.group(match.lastindex)
+ return prefix + url
+
+ def rewrite(data):
+ return pattern.sub(make_inteprox_url, data)
+
+ return rewrite
+
def rewrite_urls(self, data, prefix, log_debug):
"""Prefix all known URLs in data with prefix.
@@ -301,15 +336,9 @@
messages, usually the log_debug method of the
InteProxyHTTPRequestHandler.
"""
- url_patterns = []
- for host_regex, path_regex, classname in self.rules:
- url_patterns.append("%s%s"
- % (host_regex.pattern, path_regex.pattern))
- regex = ("(?:http|https)://(?:"
- + "|".join("(" + pattern + ")"
- for pattern in url_patterns)
- + ")")
+ regex = self.build_url_regex()
+
def make_inteprox_url(match):
url = match.group(match.lastindex)
return prefix + url
Copied: trunk/test/test_chunkedwriter.py (from rev 333, branches/streaming/test/test_chunkedwriter.py)
More information about the Inteproxy-commits
mailing list