[Inteproxy-commits] r215 - in trunk: . inteproxy test

scm-commit@wald.intevation.org scm-commit at wald.intevation.org
Thu Nov 26 22:30:04 CET 2009


Author: bh
Date: 2009-11-26 22:30:03 +0100 (Thu, 26 Nov 2009)
New Revision: 215

Modified:
   trunk/ChangeLog
   trunk/inteproxy/transcoder.py
   trunk/test/test_inteproxy.py
Log:
The regular expressions used when looking for URLs to rewrite when
the --rewrite-urls option is used match too much.  It could happen
that if one line contained several URLs that had to be rewritten,
that the regular expression that matched the first URL actually
matched that URL and the entire rest of the line, so that the rest
of line would not be processed any further, leaving the other URLs
in their original, non-rewritten form.  Now the character set used
for the wild-cards is much more restricted, which should solve the
problem.

* inteproxy/transcoder.py (TranscoderMap.add_rule): Restrict the
set of characters matched by the wild-cards by excluding those
listed as excluded in URIs in RFC 2396.

* test/test_inteproxy.py
(TestInteProxyURLRewriting.test_httpproxy_url_rewriting_with_long_lines):
New. Test-case for url replacing in long lines that contain
multiple URLs that need to be rewritten.


Modified: trunk/ChangeLog
===================================================================
--- trunk/ChangeLog	2009-09-15 20:27:48 UTC (rev 214)
+++ trunk/ChangeLog	2009-11-26 21:30:03 UTC (rev 215)
@@ -1,3 +1,24 @@
+2009-11-26  Bernhard Herzog  <bh at intevation.de>
+
+	The regular expressions used when looking for URLs to rewrite when
+	the --rewrite-urls option is used match too much.  It could happen
+	that if one line contained several URLs that had to be rewritten,
+	that the regular expression that matched the first URL actually
+	matched that URL and the entire rest of the line, so that the rest
+	of line would not be processed any further, leaving the other URLs
+	in their original, non-rewritten form.  Now the character set used
+	for the wild-cards is much more restricted, which should solve the
+	problem.
+
+	* inteproxy/transcoder.py (TranscoderMap.add_rule): Restrict the
+	set of characters matched by the wild-cards by excluding those
+	listed as excluded in URIs in RFC 2396.
+
+	* test/test_inteproxy.py
+	(TestInteProxyURLRewriting.test_httpproxy_url_rewriting_with_long_lines):
+	New. Test-case for url replacing in long lines that contain
+	multiple URLs that need to be rewritten.
+
 2009-09-15  Bernhard Herzog  <bh at intevation.de>
 
 	Add some tests with actual HTTPS servers and HTTPS proxies.  The

Modified: trunk/inteproxy/transcoder.py
===================================================================
--- trunk/inteproxy/transcoder.py	2009-09-15 20:27:48 UTC (rev 214)
+++ trunk/inteproxy/transcoder.py	2009-11-26 21:30:03 UTC (rev 215)
@@ -203,10 +203,16 @@
         netloc = rule.host
         if rule.port is not None:
             netloc += ":%d" % rule.port
-        self.rules.append((re.compile(pattern_to_regex(netloc,
-                                                       character_set="[^/]")),
-                           re.compile(pattern_to_regex(rule.path)),
-                           rule))
+        # characters that may not occur in URIs.  See RFC 2396, section 2.4.3
+        # We do not exclude '%' because that's used to escape characters
+        # in URLs, so it's likely to occur in the actual textual
+        # representation of the URLs we are working with.
+        excluded_chars = "\x00-\x20\x7F<>#\""
+        re_netloc = pattern_to_regex(netloc,
+                                     character_set=r"[^/%s]" % excluded_chars)
+        re_path = pattern_to_regex(rule.path,
+                                   character_set=r"[^%s]" % excluded_chars)
+        self.rules.append((re.compile(re_netloc), re.compile(re_path), rule))
 
     def add_rules(self, rules):
         """Adds all transcoder rules in rules"""

Modified: trunk/test/test_inteproxy.py
===================================================================
--- trunk/test/test_inteproxy.py	2009-09-15 20:27:48 UTC (rev 214)
+++ trunk/test/test_inteproxy.py	2009-11-26 21:30:03 UTC (rev 215)
@@ -243,6 +243,11 @@
          "An URL that may be rewritten: https://data.intevation.de/wms\n"
          "and one that may not: http://data.intevation.de/gis/wms\n"
          "and one that may: http://frida.intevation.org/wms/simple\n"),
+        ("/rewrite-06", [("Content-Type", "text/plain")],
+         "An long line: an URL that may be rewritten:"
+         " https://data.intevation.de/wms and one that may not:"
+         " http://data.intevation.de/gis/wms and another one that may:"
+         " http://frida.intevation.org/wms/simple\n"),
         ]
 
 
@@ -339,7 +344,26 @@
                    " http://localhost:%d/frida.intevation.org/wms/simple\n"
                           % (self.server.server_port,
                              self.server.server_port))
+    def test_httpproxy_url_rewriting_with_long_lines(self):
+        http = httplib.HTTPConnection("localhost", self.server.server_port)
+        http.request("GET",
+                     "http://localhost:%d/localhost:%d/rewrite-06"
+                     % (self.server.server_port,
+                        self.remote_server.server_port))
+        response = http.getresponse()
+        self.assertEquals(response.status, 200)
+        data = response.read()
+        self.assertEquals(data,
+                          "An long line: an URL that may be rewritten:"
+                          " http://localhost:%d/data.intevation.de/wms"
+                          " and one that may not:"
+                          " http://data.intevation.de/gis/wms"
+                          " and another one that may:"
+                     " http://localhost:%d/frida.intevation.org/wms/simple\n"
+                          % (self.server.server_port,
+                             self.server.server_port))
 
+
 class TestInteProxyWithExtraProxy(ServerTest):
 
     remote_contents = [



More information about the Inteproxy-commits mailing list