# Overplotting, Transparency, and Jitter

If we have a very large number of points to plot or our numeric variables are discrete-valued, then it is possible that using a scatterplot straightforwardly will not be informative. The visualization will suffer from *overplotting*, where the high amount of overlap in points makes it difficult to see the actual relationship between the plotted variables.

```python
plt.scatter(data = df, x = 'disc_var1', y = 'disc_var2')
```

![](/files/-Li-fAULZkiMkrtNolKh)

In the above plot, we can infer some kind of negative relationship between the two variables, but the degree of variability in the data and strength of relationship are fairly unclear. In cases like this, we may want to employ *transparency* and *jitter* to make the scatterplot more informative. Transparency can be added to a [`scatter`](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html) call by adding the "alpha" parameter set to a value between 0 (fully transparent, not visible) and 1 (fully opaque).

```python
plt.scatter(data = df, x = 'disc_var1', y = 'disc_var2', alpha = 1/5)
```

![](/files/-Li-fHvNX4O2e586ECJ4)

Where more points overlap, the darker the image will be. Here, we can now see that there is a moderate negative relationship between the two numeric variables. Values of 0 and 10 on the x-axis are much rarer than the central values.

As an alternative or companion to transparency, we can also add jitter to move the position of each point slightly from its true value. This is not a direct option in matplotlib's `scatter` function, but is a built-in option with seaborn's [`regplot`](https://seaborn.pydata.org/generated/seaborn.regplot.html) function. x- and y- jitter can be added independently, and won't affect the fit of any regression function, if made:

```python
sb.regplot(data = df, x = 'disc_var1', y = 'disc_var2', fit_reg = False,
           x_jitter = 0.2, y_jitter = 0.2, scatter_kws = {'alpha' : 1/3})
```

The jitter settings will cause each point to be plotted in a uniform ±0.2 range of their true values. Note that transparency has been changed to be a dictionary assigned to the "scatter\_kws" parameter. This is necessary so that transparency is specifically associated with the `scatter` component of the `regplot`function.

![](/files/-Li-fMI6Em9wcCZtJdRv)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://julienbeaulieu.gitbook.io/wiki/sciences/programming/data-analysis/data-visualization/bivariate-exploration-of-data/overplotting-transparency-and-jitter.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
