Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
HTML API: Reliably parse HTML in get_url_in_content()
Trac ticket: Core-63694

This also decodes the URL whereas the previous code didn’t, so
strings like `http://` will be properly decoded as `http://`.
  • Loading branch information
dmsnell committed Aug 26, 2025
commit 71b86cfc7801ae4dc927411bcb79dbcbf22c9142
12 changes: 8 additions & 4 deletions src/wp-includes/formatting.php
Original file line number Diff line number Diff line change
Expand Up @@ -5978,16 +5978,20 @@ function wp_unslash( $value ) {
*
* @since 3.6.0
*
* @param string $content A string which might contain a URL.
* @return string|false The found URL.
* @param string $content A string which might contain an `A` element with a non-empty `href` attribute.
* @return string|false Database-escaped URL via {@see esc_url()} if found, otherwise `false`.
*/
function get_url_in_content( $content ) {
if ( empty( $content ) ) {
return false;
}

if ( preg_match( '/<a\s[^>]*?href=([\'"])(.+?)\1/is', $content, $matches ) ) {
return sanitize_url( $matches[2] );
$processor = new WP_HTML_Tag_Processor( $content );
while ( $processor->next_tag( 'A' ) ) {
$href = $processor->get_attribute( 'href' );
if ( is_string( $href ) && ! empty( $href ) ) {
return sanitize_url( $href );
}
}

return false;
Expand Down
Loading