View Issue Details

This bug affects 1 person(s).
 20
IDProjectCategoryView StatusLast Update
14091Bug reportsSurvey takingpublic2023-05-29 11:13
Reportertbart Assigned Toc_schmitz  
PriorityhighSeverityminor 
Status closedResolutionreopened 
Product Version3.13.x 
Summary14091: Filenames of uploads starting with special characters truncated with invalid setlocale
Description

Non-ANSI characters (e.g. umlauts (äöü) ) at the beginning of uploaded files are cut off.

Steps To Reproduce

When taking a survey that contains a file upload field:

  1. Choose a file with an umlaut at the beginning
  2. See file with umlaut at the beginning cut off
Additional Information

This is due to locale-awareness of PHP's basename().
Also see https://stackoverflow.com/questions/45268499/php-basename-and-pathinfo-with-multibytes-utf-8-file-names/45268539#45268539 for other related functions with the same feature.

Adding

setlocale(LC_ALL, 'en_US.UTF8');

in index.php fixes it, but I am sure this is not the right place to add this.

My system locale (on debian) is correctly set (apart from LC_ALL, which normally is just an override and should not be set).

Please either respect system's LANG or set the locale to some UTF-8 compatible one.

TagsNo tags attached.
Attached Files
Bug heat20
Complete LimeSurvey version number (& build)3.27.23
I will donate to the project if issue is resolvedNo
Browser
Database type & versionmySQL 5.5.60
Server OS (if known)debian
Webserver software & version (if known)apache 2.4.10 or nginx
PHP Version5.6.36 or 7.4.25

Relationships

related to 17718 closedc_schmitz sanitize_filename didn't really fix filename for all system 
has duplicate 18844 new In file upload question, file names are obtained incorrectly, if file name starts with double-byte characters. 
related to 18432 closedgabrieljenik Can't upload and view files with Hebrew names 

Users monitoring this issue

There are no users monitoring this issue.

Activities

LouisGac

LouisGac

2019-01-10 17:40

developer   ~50169

yes we have hard filtering of file names for security reason.

tbart

tbart

2019-01-14 17:40

reporter   ~50188

Last edited: 2022-02-04 08:34

I understand file names get filtered, but this does not seem related. Why does setting the correct locale circumvent the filtering?
I feel like filtering should be done after correctly interpreting the code points with the correct locale.

But I may be wrong.

If only ASCII (or even a more reduced subset) is allowed, a note should advise users to rename their files (and only list allowed characters). Though this seems pretty uncommon. There should be standard functions to sanitize the string and frankly, umlauts don't seem all too dangerous :)

DenisChenu

DenisChenu

2019-01-15 08:40

developer   ~50192

Last edited: 2022-02-04 08:34

«yes we have hard filtering of file names for security reason. » ???

Filtering äöü have nothing with security , andis not filtered like this in 'Upload question' type.

@tbart : where it's filtered like this ? And what is the broken locale

tbart

tbart

2019-01-15 09:06

reporter   ~50194

Last edited: 2022-02-04 08:34

Your filenames do not start with special characters. See screenshot attached.

The locale should be en_US.UTF-8, at least that's what the environment sets for LANG:

set | egrep "^(LANG|LC)"

LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_NUMERIC=de_AT.utf8
LC_PAPER=de_AT.UTF-8
LC_TIME=de_AT.utf8

DenisChenu

DenisChenu

2019-01-15 10:00

developer   ~50205

Last edited: 2022-02-04 08:34

My filename was äöü.png ;) , but my locale is OK.

Can show your current locale in PHP ?

tbart

tbart

2019-01-15 10:37

reporter   ~50207

Last edited: 2022-02-04 08:34

Sorry, I misread your screenshot and thought the left column were the filenames, my bad.

phpinfo() gives me:
LANGUAGE en_US:en
LANG C

It's still strange why only umlauts at the beginning get stripped off. This should not be locale dependent!

ollehar

ollehar

2021-03-10 22:43

administrator   ~63212

Last edited: 2022-02-04 08:34

Please update to the latest version and check if the bug can still be reproduced. Thank you.

tbart

tbart

2021-03-11 10:40

reporter   ~63290

Last edited: 2022-02-04 08:34

I can confirm this is still broken in Version 3.25.17+210309.
Attached is a sample survey+file. Uploading the file results in a file name "sterreich.png"

Adding

setlocale(LC_ALL, 'en_US.UTF8');

in index.php still fixes it.

österreich.png (861 bytes)   
österreich.png (861 bytes)   
DenisChenu

DenisChenu

2021-11-04 12:14

developer   ~67112

Last edited: 2022-02-04 08:34

@ollehar did this need a fix here ?
Server issue or not ?

Else : simple fix :

if(empty(setlocale(LC_ALL,0)) {
    setlocale(LC_ALL, 'en_US.UTF8');
}

I don't have way to reproduce : need to broke my PHP ;)

DenisChenu

DenisChenu

2021-11-04 12:18

developer   ~67113

Last edited: 2022-02-04 08:34

@tbart : can you check with ONLY setlocale(LC_CTYPE, 'en_US.UTF8'); ?

DenisChenu

DenisChenu

2021-11-04 15:41

developer   ~67116

Last edited: 2022-02-04 08:34

Way to reproduce : config.php : setlocale(LC_CTYPE,"C");

Must check if i can fix without setlocale (f**king crosoft and unsure en.utf8 exist).

DenisChenu

DenisChenu

2021-11-04 15:47

developer   ~67118

Last edited: 2022-02-04 08:34

LC_TYPE to C

$filename = österreich.png
basename($filename) = sterreich.png

DenisChenu

DenisChenu

2021-11-04 15:48

developer   ~67119

Last edited: 2022-02-04 08:34

basename() is locale aware, so for it to see the correct basename with multibyte character paths, the matching locale must be set using the setlocale() function. If path contains characters which are invalid for the current locale, the behavior of basename() is undefined.

https://www.php.net/manual/en/function.basename.php

tbart

tbart

2021-11-10 11:23

reporter   ~67219

Last edited: 2022-02-04 08:34

This is on a Debian and/or Ubuntu server.
I just found out that as a default, they set LANG to C in /etc/apache2/envvars.

So

if(empty(setlocale(LC_ALL,0)))
[..]

(with 3 closing brackets :-) ) is not true, so this does not help.

setlocale(LC_CTYPE, 'en_US.UTF8');

alone does help as well.

On a not so unrelated sidenote:
This only works as long as Apache does not have mod_perl2 enabled as well, starting on Ubuntu 20.04 at least. See https://bugs.php.net/bug.php?id=81596 for details.
It seems to be generally discouraged to setlocale() in an application, so I am unsure as to whether my suggested "fix" for this issue is good or not. Maybe I did not understand nikic correctly in that thread and it is only discouraged for strftime()?

However, as Debian/Ubuntu(/most likely all other Debian based distros) - which will make up a huge percentage of all limesurvey installations I guess - set "C" as their default domain, some fix needs to be devised.
Maybe a documentation hint is necessary (mentioning the mod_perl2 issue would be practical as well, it took be hours to find this)?
Setting the locale via /etc/apache2/envvars is not thoroughly tested by me, I cannot state whether this is an actual fix that works without any side effects.

How is your/everyone else's server setup so it seems it does not use "C" as a default locale?

What I still don't get is why utf-8 characters in the middle of a filename work with the "C" locale and those at the beginning don't. This really seems like a limesurvey bug, still.

DenisChenu

DenisChenu

2021-11-10 12:00

developer   ~67226

Last edited: 2022-02-04 08:34

Oups, link with pull request not done
3.x https://github.com/LimeSurvey/LimeSurvey/pull/2134
master https://github.com/LimeSurvey/LimeSurvey/pull/2133

galads

galads

2021-11-12 10:44

reporter   ~67273

Last edited: 2022-02-04 08:34

Discussion ongoing in github

DenisChenu

DenisChenu

2021-11-12 10:45

developer   ~67274

Last edited: 2022-02-04 08:34

Add related : 17718: sanitize_filename didn't really fix filename for all system
https://bugs.limesurvey.org/view.php?id=17718

c_schmitz

c_schmitz

2022-02-05 19:56

administrator   ~68214

Last edited: 2022-02-05 19:59

My patch for 17718 only fixes it for version 5.

DenisChenu

DenisChenu

2022-09-27 13:21

developer   ~71970

Fixed in 5.
Specific issue in 3.X
OK to closes this one ?

Issue History

Date Modified Username Field Change
2018-09-24 16:21 tbart New Issue
2019-01-10 17:40 LouisGac Assigned To => LouisGac
2019-01-10 17:40 LouisGac Status new => closed
2019-01-10 17:40 LouisGac Resolution open => no change required
2019-01-10 17:40 LouisGac Note Added: 50169
2019-01-14 17:40 tbart Note Added: 50188
2019-01-15 08:40 DenisChenu Status closed => feedback
2019-01-15 08:40 DenisChenu Resolution no change required => reopened
2019-01-15 08:40 DenisChenu Note Added: 50192
2019-01-15 08:40 DenisChenu File Added: Capture d’écran du 2019-01-15 08-40-13.png
2019-01-15 08:51 DenisChenu Note Edited: 50192
2019-01-15 09:06 tbart File Added: umlauts_at_the_beginning_cut_off.png
2019-01-15 09:06 tbart Note Added: 50194
2019-01-15 09:06 tbart Status feedback => assigned
2019-01-15 10:00 DenisChenu Note Added: 50205
2019-01-15 10:37 tbart Note Added: 50207
2021-03-10 22:43 ollehar Assigned To LouisGac =>
2021-03-10 22:43 ollehar Status assigned => feedback
2021-03-10 22:43 ollehar Note Added: 63212
2021-03-11 10:40 tbart Note Added: 63290
2021-03-11 10:40 tbart File Added: limesurvey_survey_185162.lss
2021-03-11 10:40 tbart File Added: österreich.png
2021-03-11 10:40 tbart Status feedback => new
2021-03-11 11:38 ollehar Priority none => high
2021-10-07 14:09 ollehar Zoho Project Synchronization => |Yes|
2021-10-07 14:11 ollehar Assigned To => ollehar
2021-10-07 14:11 ollehar Status new => acknowledged
2021-11-04 12:10 DenisChenu Assigned To ollehar => DenisChenu
2021-11-04 12:14 DenisChenu Note Added: 67112
2021-11-04 12:18 DenisChenu Note Added: 67113
2021-11-04 15:41 DenisChenu Note Added: 67116
2021-11-04 15:42 DenisChenu Status acknowledged => confirmed
2021-11-04 15:42 DenisChenu Complete LimeSurvey version number (& build) 3.14.9+180917 => 3.27.23
2021-11-04 15:43 DenisChenu Summary Filenames of uploads starting with special characters truncated => Filenames of uploads starting with special characters truncated with invalid setlocale
2021-11-04 15:43 DenisChenu Webserver software & version (if known) apache 2.4.10 => apache 2.4.10 or nginx
2021-11-04 15:43 DenisChenu PHP Version 5.6.36 => 5.6.36 or 7.4.25
2021-11-04 15:43 DenisChenu Zoho Project Synchronization Yes => |Yes|
2021-11-04 15:47 DenisChenu Note Added: 67118
2021-11-04 15:48 DenisChenu Note Added: 67119
2021-11-10 11:23 tbart Note Added: 67219
2021-11-10 12:00 DenisChenu Note Added: 67226
2021-11-12 10:44 galads Note Added: 67273
2021-11-12 10:44 galads Bug heat 8 => 10
2021-11-12 10:45 DenisChenu Relationship added related to 17718
2021-11-12 10:45 DenisChenu Note Added: 67274
2022-02-04 08:34 DenisChenu Assigned To DenisChenu => c_schmitz
2022-02-05 19:56 c_schmitz Note Added: 68214
2022-02-05 19:56 c_schmitz Bug heat 10 => 12
2022-02-05 19:57 c_schmitz Note Edited: 68214
2022-02-05 19:59 c_schmitz Note Edited: 68214
2022-03-31 11:45 galads Zoho Project Synchronization Yes =>
2022-09-27 13:21 DenisChenu Status confirmed => closed
2022-09-27 13:21 DenisChenu Note Added: 71970
2022-10-26 10:01 DenisChenu Relationship added related to 18432
2023-05-29 11:13 DenisChenu Relationship added has duplicate 18844
2023-05-29 11:13 DenisChenu Bug heat 12 => 20