View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
04006 | Bug reports | Other | public | 2009-12-17 22:39 | 2010-02-11 17:32 |
Reporter | erick | Assigned To | |||
Priority | normal | Severity | minor | ||
Status | closed | Resolution | fixed | ||
Product Version | 1.85+ | ||||
Fixed in Version | 1.87+ | ||||
Summary | 04006: Possible inaccurate data when exporting to R | ||||
Description | When exporting answers to R format, the script generated by LimeSurvey may create inaccurate data by converting factors to numeric values with the function as.numeric(). This function's output is in fact the indices of the levels that appear in the factor.
| ||||
Tags | No tags attached. | ||||
Attached Files | export_data_r.php.patch (1,234 bytes)
--- export_data_r.php.~1~ 2009-07-21 11:30:33.000000000 -0300 +++ export_data_r.php 2009-12-16 21:50:06.000000000 -0200 @@ -30,7 +30,7 @@ * Optimization opportunities remain in the VALUE LABELS section, which runs a query / column */ -$length_varlabel = '255'; // Set the max text length of Variable Labels +$length_varlabel = '25500'; // Set the max text length of Variable Labels $headerComment = ''; $tempFile = ''; @@ -160,7 +160,7 @@ * be sent to the client. */ echo $headerComment; - echo "data=read.table(\"survey_".$surveyid."_data_file.csv\", sep=\",\", quote = \"'\", na.strings=\"\")\n names(data)=paste(\"V\",1:dim(data)[2],sep=\"\")\n"; + echo "data=read.table(\"survey_".$surveyid."_data_file.csv\", sep=\",\", quote = \"'\", na.strings=\"\")\n names(data)=paste(\"V\",1:dim(data)[2],sep=\"\", stringAsFactors=FALSE)\n"; foreach ($fields as $field){ if($field['SPSStype'] == 'DATETIME23.2') $field['size']=''; if($field['LStype'] == 'N' || $field['LStype']=='K') { @@ -236,7 +236,7 @@ } } } - echo "NA); names(data)= v.names[-length(v.names)]\nprint(str(data))\n"; + echo "NA); names(data)= v.names[-length(v.names)]\nrm(v.names)\n"; echo $errors; exit; } export_data_r2.php.patch (1,235 bytes)
--- export_data_r.php.~1~ 2009-07-21 11:30:33.000000000 -0300 +++ export_data_r.php 2009-12-21 16:51:51.000000000 -0200 @@ -30,7 +30,7 @@ * Optimization opportunities remain in the VALUE LABELS section, which runs a query / column */ -$length_varlabel = '255'; // Set the max text length of Variable Labels +$length_varlabel = '25500'; // Set the max text length of Variable Labels $headerComment = ''; $tempFile = ''; @@ -160,7 +160,7 @@ * be sent to the client. */ echo $headerComment; - echo "data=read.table(\"survey_".$surveyid."_data_file.csv\", sep=\",\", quote = \"'\", na.strings=\"\")\n names(data)=paste(\"V\",1:dim(data)[2],sep=\"\")\n"; + echo "data=read.table(\"survey_".$surveyid."_data_file.csv\", sep=\",\", quote = \"'\", na.strings=\"\", stringsAsFactors=FALSE)\n names(data)=paste(\"V\",1:dim(data)[2],sep=\"\")\n"; foreach ($fields as $field){ if($field['SPSStype'] == 'DATETIME23.2') $field['size']=''; if($field['LStype'] == 'N' || $field['LStype']=='K') { @@ -236,7 +236,7 @@ } } } - echo "NA); names(data)= v.names[-length(v.names)]\nprint(str(data))\n"; + echo "NA); names(data)= v.names[-length(v.names)]\nrm(v.names)\n"; echo $errors; exit; } | ||||
Bug heat | 8 | ||||
Complete LimeSurvey version number (& build) | 7191 | ||||
I will donate to the project if issue is resolved | |||||
Browser | |||||
Database type & version | 138 | ||||
Server OS (if known) | Linux Debian | ||||
Webserver software & version (if known) | Apache | ||||
PHP Version | 5.2.6 | ||||
@ mdekker: what do you think about that issue? |
|
Hey Livio, As you did the R export and this only involves R code I'll leave this one to you. If you are okay with the changes please commit the patch and close the report. |
|
Hey Livio, I will commit the patch, no problem. |
|
Not committing (yet) as the stringAsFactors line seems to be giving trouble: all set factors are appended at the end of the dataframe. |
|
@erick, can you please provide an export where you found the data was inaccurate? Please make the export as simple (small) as possible. |
|
I've uploaded an R script and a corresponding csv file. Note that the variable "Q1" will be read incorrectly, after converting a factor with levels ("", 4, 1). The previously uploaded patch is incorrect. The option stringsAsFactors = FALSE was given to the wrong function. I've also uploaded a corrected version for the patch. Sorry for the confusion. |
|
Now I see the problem :) the first patch had stringAsFactors... the second has stringsAsFactors (so stringS) that's why movind the code to the correct part didn't work for me :) I'll try again with this change on my problem dataset. Thanks for clarifying and attaching the example! |
|
It seems to work ok now, but I get warnings for all the missings. Do you have a clue how to fix that? In eval.with.vis(expr, envir, enclos) : NAs introduced by coercion I think the missings are incorrectly exported as "" and should be just empty. I fixed it in my install and it seems to work perfect. No more warnings and correct NA (for as far as a quick scan shows me) |
|
Coming back, the strings as factors doesn't seem to do much. The change for the missings seems to be the important change. In R we always get a factor with numbers from 1 to x and then the value labels. Original answer value is not stored! This is something to think about. read.spss does the same thing when I read my spss file to a dataframe and R help gives a warning about not using the values from the vector. If erick or livio can come up with a solution for this we can see how to implement this. So question How many children do you have? Would in spss become values 1,3,5 with according value labels but in R it would become a factor with values 1,2,3 and the labels. If you want to calculate a score based on the actual answer value this factor thing doesn't seem to be the best approach. Don't know if there is a datatype better fit for the job, or whether using just a number with some extra attributs can be of any help. I am committing the patch including the stingsAsFactors and leave the topic open to solve the answer value vs answer label problem. |
|
hi, commited my proposal. I just changed the read.csv line. unfortunatly I don't have the time to check if it works now. now those data are missing data. |
|
I fixed a problem with the \ you needed one more :) Seems to work ok for me now. |
|
nice Menno thanks again!! |
|
Date Modified | Username | Field | Change |
---|---|---|---|
2009-12-17 22:39 | erick | New Issue | |
2009-12-17 22:39 | erick | Status | new => assigned |
2009-12-17 22:39 | erick | Assigned To | => user372 |
2009-12-17 22:39 | erick | File Added: export_data_r.php.patch | |
2009-12-17 22:39 | erick | LimeSurvey build number | => 7191 |
2009-12-17 22:39 | erick | Database & DB-Version | => 138 |
2009-12-17 22:39 | erick | Operating System (Server) | => Linux Debian |
2009-12-17 22:39 | erick | Webserver | => Apache |
2009-12-17 22:39 | erick | PHP Version | => 5.2.6 |
2009-12-18 01:13 |
|
Assigned To | user372 => mdekker |
2009-12-18 01:13 |
|
Note Added: 10599 | |
2009-12-18 09:10 | mdekker | Note Added: 10602 | |
2009-12-18 09:10 | mdekker | Assigned To | mdekker => user1548 |
2009-12-18 09:10 | mdekker | Status | assigned => feedback |
2009-12-21 09:59 | mdekker | Note Added: 10615 | |
2009-12-21 11:13 | mdekker | Note Added: 10616 | |
2009-12-21 11:15 | mdekker | Note Added: 10617 | |
2009-12-21 22:04 | erick | File Added: Surveydata_syntax.R | |
2009-12-21 22:04 | erick | File Added: survey_99164_data_file.csv | |
2009-12-21 22:17 | erick | Note Added: 10633 | |
2009-12-21 22:18 | erick | File Added: export_data_r2.php.patch | |
2009-12-22 09:53 | mdekker | Note Added: 10635 | |
2009-12-22 10:05 | mdekker | Note Added: 10636 | |
2009-12-22 11:09 | mdekker | Note Added: 10639 | |
2009-12-22 20:44 |
|
Note Added: 10642 | |
2009-12-23 12:27 | mdekker | Note Added: 10658 | |
2009-12-23 12:45 |
|
Note Added: 10659 | |
2010-02-11 17:32 | c_schmitz | Status | feedback => closed |
2010-02-11 17:32 | c_schmitz | Resolution | open => fixed |
2010-02-11 17:32 | c_schmitz | Fixed in Version | => 1.87+ |
2010-05-06 10:27 | c_schmitz | Category | Import / Export => (No Category) |