Skip to content

Commit 15bc8a8

Browse files
committed
HTML API: Reliably parse HTML in get_url_in_content()
As part of a larger effort in #63694, this utlizes `WP_HTML_Tag_Processor` instead of regex to parse the string passed into `get_url_in_content`. As a benefit this also decodes the URL whereas the previous code didn’t, so strings like `http://` will be properly decoded as `http://`. Developed in: WordPress/wordpress-develop#9272 Discussed in: https://core.trac.wordpress.org/ticket/63694 Props dmsnell, jonsurrell, nerrad. Fixes #63694. Built from https://develop.svn.wordpress.org/trunk@60665 git-svn-id: https://core.svn.wordpress.org/trunk@60001 1a063a9b-81f0-0310-95a4-ce76da25c4cd
1 parent f26bef7 commit 15bc8a8

File tree

2 files changed

+9
-5
lines changed

2 files changed

+9
-5
lines changed

wp-includes/formatting.php

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5978,16 +5978,20 @@ function wp_unslash( $value ) {
59785978
*
59795979
* @since 3.6.0
59805980
*
5981-
* @param string $content A string which might contain a URL.
5982-
* @return string|false The found URL.
5981+
* @param string $content A string which might contain an `A` element with a non-empty `href` attribute.
5982+
* @return string|false Database-escaped URL via {@see esc_url()} if found, otherwise `false`.
59835983
*/
59845984
function get_url_in_content( $content ) {
59855985
if ( empty( $content ) ) {
59865986
return false;
59875987
}
59885988

5989-
if ( preg_match( '/<a\s[^>]*?href=([\'"])(.+?)\1/is', $content, $matches ) ) {
5990-
return sanitize_url( $matches[2] );
5989+
$processor = new WP_HTML_Tag_Processor( $content );
5990+
while ( $processor->next_tag( 'A' ) ) {
5991+
$href = $processor->get_attribute( 'href' );
5992+
if ( is_string( $href ) && ! empty( $href ) ) {
5993+
return sanitize_url( $href );
5994+
}
59915995
}
59925996

59935997
return false;

wp-includes/version.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
*
1717
* @global string $wp_version
1818
*/
19-
$wp_version = '6.9-alpha-60664';
19+
$wp_version = '6.9-alpha-60665';
2020

2121
/**
2222
* Holds the WordPress DB revision, increments when changes are made to the WordPress DB schema.

0 commit comments

Comments
 (0)