Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Move script data length checks to top of loop
  • Loading branch information
sirreal committed Aug 6, 2025
commit 6ad9951fd97873227a3ed23fa28910af72f73fc4
16 changes: 9 additions & 7 deletions src/wp-includes/html-api/class-wp-html-tag-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -1495,14 +1495,22 @@ private function skip_script_data(): bool {

while ( false !== $at && $at < $doc_length ) {
$at += strcspn( $html, '-<', $at );
/*
* Ultimately a SCRIPT closer (`</script>`) must be found or this function will
* return false.
* `</script` is the longest sequence that can be matched, so subsequent length checks
* are redundant.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this patch is good, but before we merge, can we also note in this comment that we’re looking for other strings and they are covered here? I think the comment at the moment is slightly misleading, but we also check for <!-- for instance, and it’s currently incidental I think that terminating false is the same for all places within the SCRIPT.

imagine, however, that we wanted to signal something different if we failed to exit the SCRIPT because there was no end-tag vs. that we ended inside some escaped state? then, by returning early at this point we removed our ability to find the <!-- that transitions state.

I believe in review that the code is solid, but I want to make sure we leave this note for the future as a warning to reconsider the check if changing things. I don’t know that these are all currently caught in the tests (which of course would be another great way to cover this risk, by adding new tests for each of these cases, if we can properly think them up).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've updated the comment to clarify things.

I do have a number of tests to add to this. I can push them to this branch.

Given how complicated script tag parsing is, I'm tempted to extract them to their own test suite like wpHtmlTagProcessor-scriptParsing. Do you have thoughts on that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled in your comment changes from #9397 and made some further tweaks. I'm pretty happy with it now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your third example in the comment is perfect. I was probably lazy in not suggesting that in my comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract them to their own test suite like wpHtmlTagProcessor-scriptParsing. Do you have thoughts on that?

not that I consider it a huge priority, but I think this is a great idea and would make it clearer for us to be more comprehensive in the tests. I think for every branch of the state machine in the HTML spec we could have data providers generating example inputs for them.

if ( $at + 8 >= $doc_length ) {
return false;
}

/*
* For all script states a "-->" transitions
* back into the normal unescaped script mode,
* even if that's the current state.
*/
if (
$at + 2 < $doc_length &&
'-' === $html[ $at ] &&
'-' === $html[ $at + 1 ] &&
'>' === $html[ $at + 2 ]
Expand All @@ -1512,10 +1520,6 @@ private function skip_script_data(): bool {
continue;
}

if ( $at + 1 >= $doc_length ) {
return false;
}

/*
* Everything of interest past here starts with "<".
* Check this character and advance position regardless.
Expand All @@ -1537,7 +1541,6 @@ private function skip_script_data(): bool {
* parsing after updating the state.
*/
if (
$at + 2 < $doc_length &&
'!' === $html[ $at ] &&
'-' === $html[ $at + 1 ] &&
'-' === $html[ $at + 2 ]
Expand All @@ -1561,7 +1564,6 @@ private function skip_script_data(): bool {
* proceed scanning to the next potential token in the text.
*/
if ( ! (
$at + 6 < $doc_length &&
( 's' === $html[ $at ] || 'S' === $html[ $at ] ) &&
( 'c' === $html[ $at + 1 ] || 'C' === $html[ $at + 1 ] ) &&
( 'r' === $html[ $at + 2 ] || 'R' === $html[ $at + 2 ] ) &&
Expand Down