[GRASS-user] v.class.mlR Error in data.frame : arguments imply differing number of rows

Dear Moritz and other Grass- users and developers,

I tried dealing with the error myself by changing predicted ← data.frame(predict(models.cv, features)) into predicted ← data.frame(predict(models.cv, features, na.action = na.exclude)), based on discussions online implying some predictions might be invalid NaN values. I checked the script output to see if this change was implemented and it was, but I get the same error.
Any suggestions what to try next?


v.class.mlR -i --overwrite segments_map=nvSegW24IDM4DV4@LUP1 training_map=TrainingApril2019@LUP1 train_class_column=class_code output_class_column=output_class output_prob_column=probability classifiers=svmLinear,rf,xgbTree folds=5 partitions=10 tunelength=10 weighting_modes=bwwv,qbwwv weighting_metric=accuracy classification_results=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\results_all_classifiers accuracy_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\accuracy_classifiers model_details=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\details_classifier_module_runs bw_plot_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\box-whicker_classifier_performance r_script_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\R_script4 processes=3
Running R now. Following output is R output.
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
Loading required package: caret
Loading required package: lattice
Loading required package: ggplot2
Warning messages:
1: package ‘caret’ was built under R version 3.5.3
2: package ‘ggplot2’ was built under R version 3.5.3
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Warning messages:
1: package ‘doParallel’ was built under R version 3.5.3
2: package ‘foreach’ was built under R version 3.5.3
3: package ‘iterators’ was built under R version 3.5.3
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
Error in data.frame(id = rownames(features), predicted) :
arguments imply differing number of rows: 17851, 17849
Execution halted
ERROR: There was an error in the execution of the R script.
Please check the R output.
(Thu Apr 11 12:09:19 2019) Command finished (1176 min 9 sec)

Hi Jamille,

On 15/04/19 14:49, Jamille Haarloo wrote:> Dear Moritz and other Grass- users and developers,
>
> I tried dealing with the error myself by changing predicted <-
> data.frame(predict(models.cv <http://models.cv>, features)) into
> predicted <- data.frame(predict(models.cv <http://models.cv>, features,
> na.action = na.exclude)), based on discussions online implying some
> predictions might be invalid NaN values. I checked the script output to
> see if this change was implemented and it was, but I get the same error.
> Any suggestions what to try next?>
> ------------------------------
> v.class.mlR -i --overwrite segments_map=nvSegW24IDM4DV4@LUP1
> training_map=TrainingApril2019@LUP1 train_class_column=class_code
> output_class_column=output_class output_prob_column=probability
> classifiers=svmLinear,rf,xgbTree folds=5 partitions=10 tunelength=10
> weighting_modes=bwwv,qbwwv weighting_metric=accuracy
> classification_results=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\results_all_classifiers

> accuracy_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\accuracy_classifiers

> model_details=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\details_classifier_module_runs

> bw_plot_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\box-whicker_classifier_performance

> r_script_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\R_script4

> processes=3
Normally, there should be no NA in the features as there is a line:

features <- na.omit(features)

early in the R script. Can you see it in the R_script4 file ?

> Running R now. Following output is R output.
> During startup - Warning messages:
> 1: Setting LC_CTYPE=en_US.cp1252 failed
> 2: Setting LC_COLLATE=en_US.cp1252 failed
> 3: Setting LC_TIME=en_US.cp1252 failed
> 4: Setting LC_MONETARY=en_US.cp1252 failed
> Loading required package: caret
> Loading required package: lattice
> Loading required package: ggplot2
> Warning messages:
> 1: package 'caret' was built under R version 3.5.3
> 2: package 'ggplot2' was built under R version 3.5.3
> Loading required package: foreach
> Loading required package: iterators
> Loading required package: parallel
> Warning messages:
> 1: package 'doParallel' was built under R version 3.5.3
> 2: package 'foreach' was built under R version 3.5.3
> 3: package 'iterators' was built under R version 3.5.3
> During startup - Warning messages:
> 1: Setting LC_CTYPE=en_US.cp1252 failed
> 2: Setting LC_COLLATE=en_US.cp1252 failed
> 3: Setting LC_TIME=en_US.cp1252 failed
> 4: Setting LC_MONETARY=en_US.cp1252 failed
> During startup - Warning messages:
> 1: Setting LC_CTYPE=en_US.cp1252 failed
> 2: Setting LC_COLLATE=en_US.cp1252 failed
> 3: Setting LC_TIME=en_US.cp1252 failed
> 4: Setting LC_MONETARY=en_US.cp1252 failed
> During startup - Warning messages:
> 1: Setting LC_CTYPE=en_US.cp1252 failed
> 2: Setting LC_COLLATE=en_US.cp1252 failed
> 3: Setting LC_TIME=en_US.cp1252 failed
> 4: Setting LC_MONETARY=en_US.cp1252 failed
> Error in data.frame(id = rownames(features), predicted) :
> arguments imply differing number of rows: 17851, 17849
> Execution halted
IDs are taken from the features and for some reasons there are two features which do not have a prediction. It might help if you could find out why.

I cannot test right now, but you might want to check if you can replace

ids <- rownames(features)

with something like

ids <- rownames(predicted)

?

Moritz

Hi Moritz,

Thank you! it worked.

I did not find the line nor similar lines of ‘features ← na.omit(features)’ in the v.class.mlR script/ R_script4 file.

Best,
Jamille

On Mon, Apr 15, 2019 at 11:09 AM Moritz Lennert <mlennert@club.worldonline.be> wrote:

Hi Jamille,

On 15/04/19 14:49, Jamille Haarloo wrote:> Dear Moritz and other Grass-
users and developers,

I tried dealing with the error myself by changing predicted ←
data.frame(predict(models.cv <http://models.cv>, features)) into
predicted ← data.frame(predict(models.cv <http://models.cv>, features,
na.action = na.exclude)), based on discussions online implying some
predictions might be invalid NaN values. I checked the script output to
see if this change was implemented and it was, but I get the same error.
Any suggestions what to try next?>

v.class.mlR -i --overwrite segments_map=nvSegW24IDM4DV4@LUP1
training_map=TrainingApril2019@LUP1 train_class_column=class_code
output_class_column=output_class output_prob_column=probability
classifiers=svmLinear,rf,xgbTree folds=5 partitions=10 tunelength=10
weighting_modes=bwwv,qbwwv weighting_metric=accuracy

classification_results=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\results_all_classifiers

accuracy_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\accuracy_classifiers

model_details=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\details_classifier_module_runs

bw_plot_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\box-whicker_classifier_performance

r_script_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\R_script4

processes=3
Normally, there should be no NA in the features as there is a line:

features ← na.omit(features)

early in the R script. Can you see it in the R_script4 file ?

Running R now. Following output is R output.
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
Loading required package: caret
Loading required package: lattice
Loading required package: ggplot2
Warning messages:
1: package ‘caret’ was built under R version 3.5.3
2: package ‘ggplot2’ was built under R version 3.5.3
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Warning messages:
1: package ‘doParallel’ was built under R version 3.5.3
2: package ‘foreach’ was built under R version 3.5.3
3: package ‘iterators’ was built under R version 3.5.3
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
Error in data.frame(id = rownames(features), predicted) :
arguments imply differing number of rows: 17851, 17849
Execution halted
IDs are taken from the features and for some reasons there are two
features which do not have a prediction. It might help if you could find
out why.

I cannot test right now, but you might want to check if you can replace

ids ← rownames(features)

with something like

ids ← rownames(predicted)

?

Moritz

On 16/04/19 14:37, Jamille Haarloo wrote:

Hi Moritz,

Thank you! it worked.

What worked, exactly ? :wink:

I did not find the line nor similar lines of 'features <- na.omit(features)' in the v.class.mlR script/ R_script4 file.

Sorry, I am working with a heavily modified version here on my computer currently, and didn't realize that this was part of my local modifications.

I also see that in the manual I actually wrote "The module makes no effort to check the input data for NA values or anything else that might perturb the analyses. It is up to the user to proceed to relevant checks before launching the module."

I could add an na.omit to the code. What is your opinion on that as a user ? Isn't it too invasive to just force this on the user ? I do acknowledge that in my local case it is convenient.

Moritz

Best,
Jamille

On Mon, Apr 15, 2019 at 11:09 AM Moritz Lennert <mlennert@club.worldonline.be <mailto:mlennert@club.worldonline.be>> wrote:

    Hi Jamille,

    On 15/04/19 14:49, Jamille Haarloo wrote:> Dear Moritz and other Grass-
    users and developers,
      >
      > I tried dealing with the error myself by changing predicted <-
      > data.frame(predict(models.cv <http://models.cv>
    <http://models.cv>, features)) into
      > predicted <- data.frame(predict(models.cv <http://models.cv>
    <http://models.cv>, features,
      > na.action = na.exclude)), based on discussions online implying some
      > predictions might be invalid NaN values. I checked the script
    output to
      > see if this change was implemented and it was, but I get the
    same error.
      > Any suggestions what to try next?>
      > ------------------------------
      > v.class.mlR -i --overwrite segments_map=nvSegW24IDM4DV4@LUP1
      > training_map=TrainingApril2019@LUP1 train_class_column=class_code
      > output_class_column=output_class output_prob_column=probability
      > classifiers=svmLinear,rf,xgbTree folds=5 partitions=10 tunelength=10
      > weighting_modes=bwwv,qbwwv weighting_metric=accuracy
      >
    classification_results=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\results_all_classifiers

      >
    accuracy_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\accuracy_classifiers

      >
    model_details=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\details_classifier_module_runs

      >
    bw_plot_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\box-whicker_classifier_performance

      >
    r_script_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\R_script4

      > processes=3
    Normally, there should be no NA in the features as there is a line:

    features <- na.omit(features)

    early in the R script. Can you see it in the R_script4 file ?

      > Running R now. Following output is R output.
      > During startup - Warning messages:
      > 1: Setting LC_CTYPE=en_US.cp1252 failed
      > 2: Setting LC_COLLATE=en_US.cp1252 failed
      > 3: Setting LC_TIME=en_US.cp1252 failed
      > 4: Setting LC_MONETARY=en_US.cp1252 failed
      > Loading required package: caret
      > Loading required package: lattice
      > Loading required package: ggplot2
      > Warning messages:
      > 1: package 'caret' was built under R version 3.5.3
      > 2: package 'ggplot2' was built under R version 3.5.3
      > Loading required package: foreach
      > Loading required package: iterators
      > Loading required package: parallel
      > Warning messages:
      > 1: package 'doParallel' was built under R version 3.5.3
      > 2: package 'foreach' was built under R version 3.5.3
      > 3: package 'iterators' was built under R version 3.5.3
      > During startup - Warning messages:
      > 1: Setting LC_CTYPE=en_US.cp1252 failed
      > 2: Setting LC_COLLATE=en_US.cp1252 failed
      > 3: Setting LC_TIME=en_US.cp1252 failed
      > 4: Setting LC_MONETARY=en_US.cp1252 failed
      > During startup - Warning messages:
      > 1: Setting LC_CTYPE=en_US.cp1252 failed
      > 2: Setting LC_COLLATE=en_US.cp1252 failed
      > 3: Setting LC_TIME=en_US.cp1252 failed
      > 4: Setting LC_MONETARY=en_US.cp1252 failed
      > During startup - Warning messages:
      > 1: Setting LC_CTYPE=en_US.cp1252 failed
      > 2: Setting LC_COLLATE=en_US.cp1252 failed
      > 3: Setting LC_TIME=en_US.cp1252 failed
      > 4: Setting LC_MONETARY=en_US.cp1252 failed
      > Error in data.frame(id = rownames(features), predicted) :
      > arguments imply differing number of rows: 17851, 17849
      > Execution halted
    IDs are taken from the features and for some reasons there are two
    features which do not have a prediction. It might help if you could
    find
    out why.

    I cannot test right now, but you might want to check if you can replace

    ids <- rownames(features)

    with something like

    ids <- rownames(predicted)

    ?

    Moritz

Replacing
ids ← rownames(features)
with
ids ← rownames(predicted)
is the only edit I did after the previous try, so this should have solved the error.

If I understood correctly na.action = na.exclude can help to work around the NA values without deleting rows but somehow I did not work. The user can always compare the original data rows to the results, right? As in my case, comparing the results file to the vector file shows that 17849 of 17851 segments were classified as expected from the error. I did not loose any original data and also saved a copy. If you mention in the manual that intermediate na values might block the analyses and will therefore be omitted from the final results, it should be perfectly fine.
I noticed a minor issue; not all results were added to the vector file - about 686 segments (almost 4% of the data) were somehow missed. Fortunately I do have the results as separate output file.

Best,
Jamille

On Tue, Apr 16, 2019 at 10:04 AM Moritz Lennert <mlennert@club.worldonline.be> wrote:

On 16/04/19 14:37, Jamille Haarloo wrote:

Hi Moritz,

Thank you! it worked.

What worked, exactly ? :wink:

I did not find the line nor similar lines of ‘features ←
na.omit(features)’ in the v.class.mlR script/ R_script4 file.

Sorry, I am working with a heavily modified version here on my computer
currently, and didn’t realize that this was part of my local modifications.

I also see that in the manual I actually wrote “The module makes no
effort to check the input data for NA values or anything else that might
perturb the analyses. It is up to the user to proceed to relevant checks
before launching the module.”

I could add an na.omit to the code. What is your opinion on that as a
user ? Isn’t it too invasive to just force this on the user ? I do
acknowledge that in my local case it is convenient.

Moritz

Best,
Jamille

On Mon, Apr 15, 2019 at 11:09 AM Moritz Lennert
<mlennert@club.worldonline.be mailto:[mlennert@club.worldonline.be](mailto:mlennert@club.worldonline.be)> wrote:

Hi Jamille,

On 15/04/19 14:49, Jamille Haarloo wrote:> Dear Moritz and other Grass-
users and developers,

I tried dealing with the error myself by changing predicted ←
data.frame(predict(models.cv <http://models.cv>
<http://models.cv>, features)) into
predicted ← data.frame(predict(models.cv <http://models.cv>
<http://models.cv>, features,
na.action = na.exclude)), based on discussions online implying some
predictions might be invalid NaN values. I checked the script
output to
see if this change was implemented and it was, but I get the
same error.
Any suggestions what to try next?>

v.class.mlR -i --overwrite segments_map=nvSegW24IDM4DV4@LUP1
training_map=TrainingApril2019@LUP1 train_class_column=class_code
output_class_column=output_class output_prob_column=probability
classifiers=svmLinear,rf,xgbTree folds=5 partitions=10 tunelength=10
weighting_modes=bwwv,qbwwv weighting_metric=accuracy

classification_results=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\results_all_classifiers

accuracy_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\accuracy_classifiers

model_details=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\details_classifier_module_runs

bw_plot_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\box-whicker_classifier_performance

r_script_file=C:\Users\haarlooj\Documents\CELOS\v.class.mlr_outputapril2019\R_script4

processes=3
Normally, there should be no NA in the features as there is a line:

features ← na.omit(features)

early in the R script. Can you see it in the R_script4 file ?

Running R now. Following output is R output.
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
Loading required package: caret
Loading required package: lattice
Loading required package: ggplot2
Warning messages:
1: package ‘caret’ was built under R version 3.5.3
2: package ‘ggplot2’ was built under R version 3.5.3
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Warning messages:
1: package ‘doParallel’ was built under R version 3.5.3
2: package ‘foreach’ was built under R version 3.5.3
3: package ‘iterators’ was built under R version 3.5.3
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
During startup - Warning messages:
1: Setting LC_CTYPE=en_US.cp1252 failed
2: Setting LC_COLLATE=en_US.cp1252 failed
3: Setting LC_TIME=en_US.cp1252 failed
4: Setting LC_MONETARY=en_US.cp1252 failed
Error in data.frame(id = rownames(features), predicted) :
arguments imply differing number of rows: 17851, 17849
Execution halted
IDs are taken from the features and for some reasons there are two
features which do not have a prediction. It might help if you could
find
out why.

I cannot test right now, but you might want to check if you can replace

ids ← rownames(features)

with something like

ids ← rownames(predicted)

?

Moritz

On 16/04/19 16:50, Jamille Haarloo wrote:

Replacing
ids <- rownames(features)
with
ids <- rownames(predicted)
is the only edit I did after the previous try, so this should have solved the error.

Watch out: I'm not sure that this is really equivalent. I think I used the rownames(features) to get the original ids of the features, while rownames(predicted) apparently gives 1,2,3,4,etc. So you will not be able to link the results back to your original features...

If I understood correctly na.action = na.exclude can help to work around the NA values without deleting rows but somehow I did not work. The user can always compare the original data rows to the results, right? As in my case, comparing the results file to the vector file shows that 17849 of 17851 segments were classified as expected from the error. I did not loose any original data and also saved a copy. If you mention in the manual that intermediate na values might block the analyses and will therefore be omitted from the final results, it should be perfectly fine.

You are right that na.exclude is probably better here than na.omit. IIUC, this will give you NAs in the prediction output as well.

So I think that na.exclude is probably the safest bet and shouldn't create too much of a surprise for the user.

I noticed a minor issue; not all results were added to the vector file - about 686 segments (almost 4% of the data) were somehow missed.

That's weird. Maybe this is due to the different ids ?

Moritz