Skip to content

Conversation

@sirreal
Copy link
Member

@sirreal sirreal commented Jun 27, 2025

Prevent the wp_kses_normalize_entities function from transforming inputs like ' to ', changing its value. That transformation changes the input in a way that is not normalized results in significantly different HTML.

Trac ticket: https://core.trac.wordpress.org/ticket/63630

✅ (merged) This change includes #9095 which should be reviewed and landed first.


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@github-actions
Copy link

github-actions bot commented Jun 27, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@sirreal sirreal requested review from dmsnell and ellatrix June 27, 2025 17:44
@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@sirreal sirreal force-pushed the fix-63630-prevent-incorrect-normalize-numeric-char-refs branch from 1af9cfc to c25967a Compare July 10, 2025 16:12
@sirreal
Copy link
Member Author

sirreal commented Jul 10, 2025

PR 9095 landed in [60446], this change is now unblocked.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it think it would be best to find another reviewer (perhaps @xknown or @johnbillion or @mdawaffe) I think this is sound at face value and it’s not breaking tests.

The fix makes sense theoretically and behaves as we expect.

As a side note, this entire function is ripe for replacement with the HTML API whereby we can reliably resolve encoding and escaping issues, but for now this change brings an immediate improvement without raising new questions about legacy bugs or interfaces.

This change should break behavior with old sites, but as demonstrated in the comment in the code, I believe that any existing code relying on the broken behavior is bound to be broken in some way already, making this change a wash.

For HTML saved as intended, this preserves existing behaviors.

@dmsnell
Copy link
Member

dmsnell commented Jul 14, 2025

as a clarification, I wanted to note that while this introduces a change of behavior, it’s not doing so in a way I would expect it to break things.

previously, those characters might be corrupted through Core, and if some plugin looks for them and tries to repair them, perhaps that code would now see different data coming through the filter stack.

in all of the cases where the behavior is different though, the existing options are fundamentally broken because the corruption happened by Core at the start. this should only improve the situation.

@sirreal sirreal force-pushed the fix-63630-prevent-incorrect-normalize-numeric-char-refs branch from c25967a to c325e48 Compare August 5, 2025 14:07
@sirreal
Copy link
Member Author

sirreal commented Aug 5, 2025

I shared this and requested review at the July 16, 2025 Core devchat.

I plan to land this in the next few days unless there are reviews that raise concerns or other issues.

@sirreal sirreal requested a review from Copilot August 7, 2025 07:23
sirreal added 2 commits August 7, 2025 09:23
Add test cases to the `wp_kses_normalize_entities` test to cover
https://core.trac.wordpress.org/ticket/63630
The wp_kses_normalize_entities function should not decode double-encoded
inputs like `&#2E;` to `&#2E;`. Ensure that the normalization steps
are processed in the correct order so that the input is normalized and
its value is preserved.
@sirreal sirreal force-pushed the fix-63630-prevent-incorrect-normalize-numeric-char-refs branch from c325e48 to 133bd8e Compare August 7, 2025 07:23

This comment was marked as outdated.

@sirreal sirreal requested review from Copilot and removed request for ellatrix August 7, 2025 07:27

This comment was marked as resolved.

pento pushed a commit that referenced this pull request Aug 7, 2025
… references.

Fixes an issue where `wp_kses_normalize_entities` would transform inputs like "'" into "'", changing the intended HTML text.

This behavior has present since the initial version of KSES was introduced in [649].

[2896] applied the normalization to post content for users without the "unfiltered_html" capability.

Developed in #9099.

Props jonsurrell, dmsnell, sirlouen.
Fixes #63630.


git-svn-id: https://develop.svn.wordpress.org/trunk@60616 602fd350-edb4-49c9-b593-d223f7449a82
@github-actions
Copy link

github-actions bot commented Aug 7, 2025

A commit was made that fixes the Trac ticket referenced in the description of this pull request.

SVN changeset: 60616
GitHub commit: b91b757

This PR will be closed, but please confirm the accuracy of this and reopen if there is more work to be done.

@github-actions github-actions bot closed this Aug 7, 2025
@sirreal sirreal deleted the fix-63630-prevent-incorrect-normalize-numeric-char-refs branch August 7, 2025 07:59
github-actions bot pushed a commit to platformsh/wordpress-performance that referenced this pull request Aug 7, 2025
… references.

Fixes an issue where `wp_kses_normalize_entities` would transform inputs like "'" into "'", changing the intended HTML text.

This behavior has present since the initial version of KSES was introduced in [649].

[2896] applied the normalization to post content for users without the "unfiltered_html" capability.

Developed in WordPress/wordpress-develop#9099.

Props jonsurrell, dmsnell, sirlouen.
Fixes #63630.

Built from https://develop.svn.wordpress.org/trunk@60616


git-svn-id: https://core.svn.wordpress.org/trunk@59952 1a063a9b-81f0-0310-95a4-ce76da25c4cd
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Aug 7, 2025
… references.

Fixes an issue where `wp_kses_normalize_entities` would transform inputs like "'" into "'", changing the intended HTML text.

This behavior has present since the initial version of KSES was introduced in [649].

[2896] applied the normalization to post content for users without the "unfiltered_html" capability.

Developed in WordPress/wordpress-develop#9099.

Props jonsurrell, dmsnell, sirlouen.
Fixes #63630.

Built from https://develop.svn.wordpress.org/trunk@60616


git-svn-id: http://core.svn.wordpress.org/trunk@59952 1a063a9b-81f0-0310-95a4-ce76da25c4cd
jonnynews pushed a commit to spacedmonkey/wordpress-develop that referenced this pull request Sep 24, 2025
… references.

Fixes an issue where `wp_kses_normalize_entities` would transform inputs like "&WordPress#39;" into "&WordPress#39;", changing the intended HTML text.

This behavior has present since the initial version of KSES was introduced in [649].

[2896] applied the normalization to post content for users without the "unfiltered_html" capability.

Developed in WordPress#9099.

Props jonsurrell, dmsnell, sirlouen.
Fixes #63630.


git-svn-id: https://develop.svn.wordpress.org/trunk@60616 602fd350-edb4-49c9-b593-d223f7449a82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants