fix: recover latin-1 encoded Location headers on redirects by MAXDVVV · Pull Request #12325 · aio-libs/aiohttp

MAXDVVV · 2026-04-06T18:17:05Z

Problem

When a server sends a Location header containing raw latin-1 encoded bytes (e.g. \xf8 for ø), the redirect URL gets corrupted.

Redirect chain example (from #10047):

https://cornelius-k.dk/synsproeve/
  → Location: https://cornelius-k.dk/synsprøve  (URL-encoded %C3%B8, OK)
  → Location: https://cornelius-k.dk/synspr\xf8ve  (raw latin-1 byte!)
    → aiohttp sees: https://cornelius-k.dk/synspr\udcf8ve  (broken surrogate)
    → 404!

Root cause

The HTTP parser decodes header values with utf-8/surrogateescape (http_parser.py L208). When a server sends raw latin-1 bytes in the Location header (which some servers do, despite RFC violations), bytes like \xf8 are not valid UTF-8 and get decoded as surrogates like \udcf8. These surrogates then cause URL() to produce a broken URL.

Fix

In the redirect handling code (client.py), after reading the Location header value, detect if it contains surrogates (can't encode to UTF-8). If so, round-trip through surrogateescape back to bytes and decode as latin-1, recovering the original characters:

'\udcf8' → encode('utf-8', 'surrogateescape') → b'\xf8' → decode('latin-1') → 'ø'

This is a targeted fix that only affects redirect URL processing, not general header decoding.

Verification

>>> r_url = 'https://cornelius-k.dk/synspr\udcf8ve'
>>> raw = r_url.encode('utf-8', 'surrogateescape')
>>> r_url = raw.decode('latin-1')
>>> r_url
'https://cornelius-k.dk/synsprøve'  # correct!

Fixes #10047

…-libs#10047)

aiohttp/client.py

codecov · 2026-04-06T18:22:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.11%. Comparing base (e412ccb) to head (f11f79d).
⚠️ Report is 7 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff            @@
##           master   #12325    +/-   ##
========================================
  Coverage   99.11%   99.11%            
========================================
  Files         130      130            
  Lines       45558    45658   +100     
  Branches     2404     2406     +2     
========================================
+ Hits        45156    45256   +100     
  Misses        272      272            
  Partials      130      130

Flag	Coverage Δ
CI-GHA	`98.97% <100.00%> (+<0.01%)`	⬆️
OS-Linux	`98.72% <100.00%> (-0.01%)`	⬇️
OS-Windows	`96.97% <100.00%> (-0.03%)`	⬇️
OS-macOS	`97.87% <100.00%> (-0.01%)`	⬇️
Py-3.10.11	`97.43% <100.00%> (+<0.01%)`	⬆️
Py-3.10.20	`97.90% <100.00%> (+<0.01%)`	⬆️
Py-3.11.15	`98.11% <100.00%> (+0.01%)`	⬆️
Py-3.11.9	`97.64% <100.00%> (+<0.01%)`	⬆️
Py-3.12.10	`97.72% <100.00%> (+<0.01%)`	⬆️
Py-3.12.13	`98.20% <100.00%> (+<0.01%)`	⬆️
Py-3.13.12	`98.45% <100.00%> (+<0.01%)`	⬆️
Py-3.14.3	`98.50% <100.00%> (+<0.01%)`	⬆️
Py-3.14.3t	`?`
Py-3.14.4t	`97.51% <100.00%> (?)`
Py-pypy3.11.15-7.3.21	`97.39% <100.00%> (-0.01%)`	⬇️
VM-macos	`97.87% <100.00%> (-0.01%)`	⬇️
VM-ubuntu	`98.72% <100.00%> (-0.01%)`	⬇️
VM-windows	`96.97% <100.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codspeed-hq · 2026-04-06T18:23:23Z

Merging this PR will not alter performance

✅ 61 untouched benchmarks
⏩ 4 skipped benchmarks¹

_{Comparing MAXDVVV:fix/redirect-non-ascii-location-10047 (f11f79d) with master (f55503d)}

4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Dreamsorcerer · 2026-04-06T22:01:07Z

aiohttp/client.py

+                        except (UnicodeEncodeError, UnicodeDecodeError):
+                            try:
+                                raw = r_url.encode("utf-8", "surrogateescape")
+                                r_url = raw.decode("latin-1")


What if it's not latin-1? This seems unreasonable for us to just start guessing charsets randomly.

If fallback_charset_resolver is set, we could use that instead maybe?

… first, latin-1 fallback

…very Address reviewer feedback: instead of hardcoding latin-1, consult the session's fallback_charset_resolver to determine the charset for recovering non-ASCII Location headers. Latin-1 remains the ultimate fallback per RFC 7230 (historical HTTP/1.1 header encoding). Refs: aio-libs#10047

Dreamsorcerer · 2026-04-12T12:54:49Z

aiohttp/client.py

+                        _raw = r_url.encode("utf-8", "surrogateescape")
+                        _charset = self._resolve_charset(resp, _raw)
+                        r_url = _recover_redirect_location(r_url, _charset)


Surely we just decode it with the charset and lose the new function..?

fix: recover latin-1 encoded Location headers on redirects (Fixes aio…

0670799

…-libs#10047)

MAXDVVV requested review from asvetlov and webknjaz as code owners April 6, 2026 18:17

psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Apr 6, 2026

github-advanced-security bot found potential problems Apr 6, 2026

View reviewed changes

aiohttp/client.py Fixed Show fixed Hide fixed

Dreamsorcerer reviewed Apr 6, 2026

View reviewed changes

zhanlong9890 added 2 commits April 7, 2026 12:25

fix(client): recover redirect Location via surrogateescape with utf-8…

a440222

… first, latin-1 fallback

style(tests): apply isort+black formatting for pre-commit

5ea52e5

Dreamsorcerer added the pr-unfinished The PR is unfinished and may need a volunteer to complete it label Apr 7, 2026

Dreamsorcerer reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: recover latin-1 encoded Location headers on redirects#12325

fix: recover latin-1 encoded Location headers on redirects#12325
MAXDVVV wants to merge 4 commits intoaio-libs:masterfrom
MAXDVVV:fix/redirect-non-ascii-location-10047

MAXDVVV commented Apr 6, 2026

Uh oh!

Uh oh!

codecov bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

Dreamsorcerer Apr 6, 2026

Uh oh!

Dreamsorcerer Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

MAXDVVV commented Apr 6, 2026

Problem

Root cause

Fix

Verification

Uh oh!

Uh oh!

codecov bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

Dreamsorcerer Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Dreamsorcerer Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Apr 6, 2026 •

edited

Loading

codspeed-hq bot commented Apr 6, 2026 •

edited

Loading