Skip to content

geom_smooth on influence plots can be wonky/unstable #431

@bbolker

Description

@bbolker

For small data sets the smooth line created by geom_smooth can be very unstable.

Here's an example (reprex below) of an influence plot where the raw y-axis values range from -2 to 1 but the loess line from geom_smooth goes up to about 3.5e10. I don't know what the best approach for solving this is ... Post hoc, you can check the data layer via ggplot_build to see if the smooth component has unrealistically large amplitude, but that seems awkward. (TBH I'm not quite sure why we even need a smooth line here, the trend in this plot doesn't seem to be important?)

Left panel: see influence plot; middle, ggplot with just the points; right, ggplot with points and geom_smooth().

Image
library(performance)
library(ggplot2)
library(patchwork)

sdata <- data.frame(
  TrtLin = factor(rep(
    c("C1", "C2", "C3", "C4", "S1", "S2", "S3", "S4"),
    rep(c(35L, 63L, 62L, 57L), 2)
  )),
  this_male_mated = rep(c(rep(0:1, 98), 0), 
                        c(
      2L, 2L, 6L, 4L, 2L, 1L, 3L, 2L, 1L, 3L, 3L, 1L, 5L, 1L, 1L, 1L, 1L, 1L, 2L,
      1L, 4L, 2L, 3L, 2L, 1L, 11L, 2L, 2L, 1L, 4L, 3L, 6L, 1L, 1L, 4L, 3L, 2L, 1L,
      7L, 2L, 3L, 2L, 2L, 5L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 2L,
      1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 4L, 2L, 2L, 2L, 1L, 1L, 4L, 1L, 1L, 1L,
      1L, 1L, 3L, 1L, 9L, 6L, 1L, 5L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
      2L, 2L, 2L, 1L, 2L, 1L, 7L, 1L, 1L, 2L, 8L, 1L, 2L, 1L, 6L, 1L, 4L, 1L, 1L,
      3L, 3L, 2L, 2L, 1L, 1L, 3L, 1L, 2L, 1L, 1L, 1L, 2L, 8L, 3L, 2L, 3L, 3L, 1L,
      4L, 1L, 2L, 3L, 3L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 3L, 5L,
      1L, 2L, 2L, 2L, 3L, 1L, 3L, 4L, 3L, 2L, 1L, 5L, 4L, 1L, 1L, 3L, 1L, 3L, 1L,
      1L, 1L, 1L, 1L, 3L, 2L, 1L, 3L, 3L, 5L, 1L, 1L, 3L, 1L, 2L, 1L, 2L, 2L, 3L,
      3L, 2L, 2L, 1L, 2L, 6L, 1L
    )
  )
)

themodel <- glm(this_male_mated ~ TrtLin,
                data = sdata,
                family = binomial)


dd <- check_model(themodel, check = "outliers")
p1 <- plot(dd)
p0 <- ggplot(dd$INFLUENTIAL, aes(Hat, Std_Residuals)) + geom_point()
p2 <- print(p0)
p3 <- p0 + geom_smooth(se=FALSE)

p1 + p2 + p3 + plot_layout(nrow = 1, ncol = 3)
ggsave("plot123.png")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions