Skip to content

Conversation

@mitogh
Copy link
Member

@mitogh mitogh commented May 8, 2022

When an image or a URL is downloaded using WordPress function download_url and if the provided URL does not contain the file extension, instead of using tmp try to guess the extension of the file by using the provided mime type of the file.

An important consideration is to make sure the mime type is supported by the current WordPress installation to prevent download and upload of non supported mime types on the current installation.

Trac ticket: https://core.trac.wordpress.org/ticket/54738

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

mitogh added 2 commits May 8, 2022 17:15
When an image or a URL is downloaded using WordPress function `download_url` and if the provided URL does not contain the
file extension, instead of using `tmp` try to guess the extension
of the file by using the provided mime type of the file.

An important consideration is to make sure the mime type is supported
by the current WordPress installation to prevent download and upload
of non supported mime types on the current installation.
Make sure files follow the rules established in this repository
Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mitogh this seems like a reasonably good idea to fill in an extension if an HTTP response includes a file but doesn't provide one and if it also provides a MIME type.

I wonder though how this interacts with the inferred filename from the Content-Disposition header logic immediately above it. Supposing that WordPress receives a filename, it may rename the file once, and then immediately rename it again if we then append an extension.

Do you think it would be fitting to introduce a new filter, such as download_url_filename which could resolve discrepancies with the filename? Or alternatively, maybe we could introduce this directly for now in the code without a filter, but still with a kind of stack of filename transformations…

$wanted_name = $tmpfname;

// Check if the server suggested a filename for this response.
$content_disposition = 
if ( $content_disposition ) {
	…
	$wanted_name = $name_from_dispotion;
}

// Check if the filename is missing an extension, but the server hints at its MIME type.
$content_type = 
if ( $content_type ) {
	// Check if extension exists and if it matches the MIME type.
	// If not, then add the extension.
	$wanted_name = "{$wanted_name}.{$guessed_extension}";
}

// Rename the downloaded file, if requested.
if ( $wanted_name !== $tmpfname ) {
	// rename, delete if it failed
}

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I may have been too focused on the diff to miss that we already set the filename based on the content-disposition.

I left one new comment that's mainly just about ensuring we're doing what the intend to do with the extension, but otherwise this looks good.

$valid_mime_types = array_flip( get_allowed_mime_types() );
if ( ! empty( $valid_mime_types[ $mime_type ] ) ) {
$extensions = explode( '|', $valid_mime_types[ $mime_type ] );
$new_image_name = str_replace( '.tmp', ".{$extensions[0]}", $tmpfname );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're going to intend on replacing the extension, it would be helpful to ensure that we only replace a terminating .tmp instead of any .tmp because a file could be named something like file.tmp.dat.tmp and then we'd transform more than we want.

here we could use a PCRE function to ensure this is the end that we're replacing.

$new_image_name = preg_replace( '/\.tmp$/', ".{$extensions[0]}", $tmpfname );

We could also do this with substr(), if we're confident that pathinfo is returning tmp only when the file ends in .tmp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the PHP docs, PATHINFO_EXTENSION only returns the last extension if a path has more than one:

https://www.php.net/manual/en/function.pathinfo.php

So it seems to me that we should have that confidence.

@joedolson joedolson mentioned this pull request Sep 30, 2024
@github-actions
Copy link

github-actions bot commented Mar 3, 2025

A commit was made that fixes the Trac ticket referenced in the description of this pull request.

SVN changeset: 59902
GitHub commit: b4f0fc9

This PR will be closed, but please confirm the accuracy of this and reopen if there is more work to be done.

@github-actions github-actions bot closed this Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants