October 25, 2023
When visualizing data, the typical tool (besides tables) is to create plots of the data. For this, a huge variety of different tools are out there, ranging from MS excel, matplotlib, gnuplot, matlab, and many more. As my main publication outlet consists of research papers which are usually created as PDFs through LaTeX, there is also the possibility of creating the plots directly in LaTeX with the help of tikz and pgfplots. As I use tikz anyway, this is the "natural" way for me. Other people might not be seasoned tikz users and feel overwhelmed by the usage of pgfplots. Therefore I want to show a few techniques and general settings that make typesetting plots with pgfplots easy. This blog post is not geared towards any specific profession, but as I'm an electrical engineer I will be using (among others) code examples from matlab/octave, as these are common tools. I figure that similar methods for data preparation and plotting comparisons are also possible with, say, R, but I have no experience with that.
First of, let's talk about optaining data.
Typically one wants to show the output of some simulation, some processing on externally acquired data or statistical analysis, etc.
Therefore, it is a reasonable assumption that the data can be saved as some sort of separator-delimited file, for instance comma-separated values (CSV).
This is a good fit for pgfplots, whose \addplot table
command can read delimited data such like this.
Often I also use data that is just delimited by spaces, which works perfectly fine for numbers-only data (data with strings might break this).
If you have data in matlab/octave (either obtained directly in matlab/octave or post-processed), the easiest way to create files for plotting is dlmwrite
or csvwrite
(note that in matlab there is also writematrix
but I don't have any experience with that).
A typical way for doing this is something along the following (I took this from some simulation code that I wrote a while ago):
filename = 'foobar.dat';
file = fopen(filename, 'w');
fprintf(file, '%s\t%s\t%s\t%s\t\n', 't', 'y', 'yth', 'vctrl');
fclose(file);
dlmwrite(
filename,
[1e9 * t', y, yth, vctrl],
'append', 'on',
'delimiter', '\t',
'precision', '%+.5f'
);
This is pretty straight-forward.
The only gotchas are the setting of the right precision (highly dependent on the data), the proper delimiter and the creation of the matrix that is written.
Note that I had to transposte the time vector t
in this example, but that again depends on the data (one of the reasons why I don't like programming in matlab/octave).
This creates a file like the following:
t y yth vctrl
-0.00000 +0.00000 +0.00000 +0.00000
+0.00500 +0.01413 +0.00000 +0.00000
+0.01000 +0.02826 +0.00000 +0.00000
+0.01500 +0.04235 +0.00000 +0.00000
+0.02000 +0.05640 +0.00000 +0.00000
+0.02500 +0.07040 +0.00000 +0.00000
+0.03000 +0.08432 +0.00000 +0.00000
+0.03500 +0.09816 +0.00000 +0.00000
This is a suitable file for plotting with pgfplots. Of course, other formats are possible, but this is what I usually employ.
The following section will discuss the basic configuration of pgfplots. After that, I will go into a few details of plotting, followed by a few tips on more exotic topics.
My goal is to obtain a configuration that allows us later on to plot data with only two-ish commands. For this, the configuration should take all details out of the picture by providing sane defaults that only need to be overwritten in special circumstances. Luckily, the pgfplots defaults already take care of many things, but I still have around 30 lines of settings that I use in virtually every document.
Let's look at some samples with the following plotting code:
\begin{axis}
[
xlabel = x,
ylabel = y,
]
\addplot {x};
\addplot {x + 1};
\addplot {x + 2};
\addplot {x + 3};
\end{axis}
Without any further configuration, this plot looks like this:
It does not look bad (at least the fonts are readable and match the document font), but I don't really like the colors and I don't want to have the dots on the data points.
Furthermore, there is no grid and no axis labels (granted, we did not provide any).
For the following configuration, all the settings will go into a \pgfplotsset{ }
.
Furhtermore, in the beginning axis-related settings are discussed.
There are different ways of setting defaults for axis plots, I use every axis/.append style = { }
, but some code also goes in the general settings. All in all our configuration looks like this:
\pgfplotsset{
every axis/.append style = {
% settings go in here
},
% and some settings go in here
}
Let's start with the grid. I like to have a grid on the major ticks but also the minor ticks (if there are any). This is a simple setting:
grid = both,
The grid style can also be set, but this code does not live in the every axis/.append style
section.
grid style={
black,
line width = 0.3pt,
dash pattern = on 0pt off 1.2pt,
line cap = round
},
The shown lines configure the appearance of the grid as black (I don't like grey grids) with a thin line.
The last two lines make the grid lines little circles by using a trick.
The dash pattern
key is used which allows to set an "on"-width and an "off"-width.
The "on"-width is zero, which seems weird as first.
But as the line cap
is round
, the minimum endings of the path (half circles) are drawn, which ends up as two half circles with no line in-between.
The "off"-width then just sets the spacing of the circles.
After the grid we configure the appearance of the ticks. Per default, scaled ticks are used, which puts a common factor (like 1e9) next to the axis. Personally, I find this quite ugly, so I don't use this, so let's disable it:
scaled ticks = false,
Furthermore, I set the general appearance with the following settings:
tick align = inside,
major tick length = {4pt},
minor tick length = {2pt},
every tick/.append style = {semithick},
These are just settings that I like and I think make sense. I tried out a few different versions, so I encourage you to do the same, but these settings go well with the rest of my configuration.
tick align
determines where exactly the ticks are drawn.
Other possible values are center
and outside
.
The former draws ticks inside and outside, which I, frankly, find a little insane.
The latter can be ok for some people, I guess, I don't like it.
I want my stuff inside the axis.
Furthermore, I use longer ticks for the major ones, the minor ones are a bit shorter.
Lastly, the tick style I use puts them a little thicker than normal, but as you will see I use thicker lines for most my stuff.
For the axis size I use the following:
scale only axis,
width=0.7\linewidth,
height=2.5cm,
This sets all plots to the same size.
The first key (scale only axis
) ensures that the axis size does not depend on the labels and tick labels.
I think this behaviour makes much more sense than the default, as this makes it much easier to scale all plots in one document to the same size.
I like all plots to have the same width and height, although some plots require special settings:
For special plots I sometimes set higher heights and the width set this way can't fit more than one plot per page column.
Note that the setting for the width does not use the entire \linewidth
, as there needs to be room for the labels.
70 percent of the line width is (in my experience) a good value for two columns, so it might have to be adjusted for one-column page layouts.
Next up, the line styles:
As I already mentioned, I like thicker lines.
Therefore, I set both the width of the axis lines as well as the actual plots to a thicker width.
The axis width goes into the settings of every axis/.append style
just as before:
axis line style = {thick},
The width of the plots is configured like this (but this does not go into the every axis/.append style
section:
every axis plot/.style={thick, line join = round},
So far we have configured the appearance of the axis, but not the plot itself (well, only the line thickness).
Currently they are still blue with markers.
The best way to fix this is to set a cycle list
.
This defines how consecutive plots look like.
The first plot uses the first entry, the second plot the second entry and so on.
Once the end of the list is reached, it starts from the beginning.
But reaching the end of the cycle list is a good indicator that the respective axes have too many plots.
For the setting of the cycle list I only consider black-and-white plots, as most of my axes plots are geared towards a potential black-and-white print. I show my settings here, but this is a good place to make changes.
cycle list = {
solid,
{dash pattern = on 1pt off 1.5pt},
{dash pattern = on 8pt off 4pt},
{dash pattern = on 1pt off 0.4pt},
},
I don't make too heavy use of the cycle list. If I have more than one or two plots in an axis, I explicitly set the color or the line style when adding the plot. But this cycle list also gets rid of the markers, so that's also helpful.
This leaves only the legend to be configured. Usually I tend to avoid legends (I prefer markers such as arrows or I explain the difference in the figure caption or something similar), but for some plots it gets messy without legends. In my experience there is no one style to rule them all, therefore I will only give some hints. Legend appearance is something that also pops up in local settings for a single plot.
In general, for the position of the legend there are two possible options: inside or outside of the axis. As not all my plots in a paper have legends I don't like to position legends outside of the axis, as these would mean that I have to allocate space for it. This in turn messes up the alignment of the plots or I have to make a compromise and (unnecessarily) reduce the plot width of plots without legends. Therefore, legends go inside of the plot. There, the position depends on the data. I try to put them in the upper-right corner. If this does not work out the the legend needs to be tweaked in the local settings. Furthermore I reduce the font slightly (to make it easier to fit a legend into a plot) and a left-align the cells. Lastly, for default legends I don't want them to obscure parts of the plot, therefore I turn off the boundary and the filling:
legend cell align = left,
legend pos = {north east},
every axis legend/.append style = {
font = \small,
draw = none
fill = none
},
Often my plots have quite serious space constraints, therefore the legend can not increase the plot size. If it is possible to slightly decrease the plot resolution (by increasing the data range), the plot can make some room for legends without increasing the size of the axis. This allows the inclusion of legends in these kind of plots. An example for this is shown in the next image, which is taken from one of my papers:
Here, the data range is slightly higher than required, but the plot is still readable.
Sometimes it is also possible to show more data that is not interesting in order to "sacrifice" it. Then I use a legend that draws over parts of the plot. For this, I define a style that I then can just use in the plot by listing it in the local settings. Again, this makes sure that all legends that are drawn like this look the same:
obscuring legend/.style = {
every axis legend/.append style = {
draw = black,
fill = white,
}
},
The entire configuration is shown here:
\pgfplotsset{
every axis/.append style={
grid=both,
enlarge x limits = false,
enlarge y limits = true,
%rotate ylabel,
width=0.7\linewidth,
height=2.5cm,
scale only axis,
scaled ticks=false,
axis line style = {thick},
tick align = inside,
major tick length = {4pt},
minor tick length = {2pt},
every tick/.append style = {semithick, black},
ylabel style = {minimum height = 1cm}
},
every axis plot/.style={thick, line join = round},
grid style={
black,
line width = 0.3pt,
dash pattern = on 0pt off 1.2pt,
line cap = round
},
legend cell align = left,
legend pos = {north east},
every axis legend/.append style = {
font = \small,
draw = none
fill = none
},
obscuring legend/.style = {
every axis legend/.append style = {
draw = black,
fill = white,
}
},
cycle list = {
solid,
{dash pattern = on 1pt off 1.5pt},
{dash pattern = on 8pt off 4pt},
{dash pattern = on 1pt off 0.4pt},
},
}
With the shown configuration, the test plot now looks like this:
I think this is much better and a good foundation for beautiful plots (the aspect ratio is not ideal, as it was designed for a two-column layout. The test image was created in a standalone
document, which uses different settings).
Of course it needs to be filled with data that is more interesting.
This also introduces some specialities with regard to useful commands and keys.
I will get into this further on.
The majority of plots that I show in my publications are time-domain or frequency-domain signal plots. Here, I like to have precise control over the x-axis, whereas I'm ok with having some visual margin for the y-axis. There are two keys that control these margins. According to my constraints I set them like this:
enlarge x limits = false,
enlarge y limits = auto,
With this, the x-axis covers exactly the given range (explicitly or inferred from the data), where as the y-range is enlarged if there is no explicit range given. This is usually what I want, but these are keys I occasionally also set locally. Furthermore, this is definitely a matter of taste and type of displayed data.
The grid in an axis consists of a major grid and a minor grid.
The minor grid is drawn at the location of the minor ticks, which are unnumbered.
Per default these are not draw, but I think the appearance with a finer grid is pleasing.
I typically try to add minor ticks in sense that makes sense for the data (e.g. three division is usually a bit unfitting, but five is fine) and that result in little squares (from the crossings of the x- and the y-grid lines).
The keys for adding minor ticks are minor x tick num
and minor x tick num
.
A value of 1
adds one minor ticks between two major ticks, so to have five divisions the value for the minor tick num should be four.
Creating basic plots with pgfplots is pretty simple:
Put an axis
environment in a tikzpicture
and issue some \addplot
calls.
As an example, let's take data file where I showed an excert from above.
As a filename we assume data.dat.
In matlab/octave, a typical code to plot the data would look like this:
data = dlmread('data.dat');
figure;
plot(data(:, 1), data(:, 2));
grid on;
xlabel('t (s)');
ylabel('y (V)');
Note that with matrices in the right format the plot
call can be simpler (without giving explicit dimensions), but I show it like this to demonstrate the switch to pgfplots.
The plot generated by octave (matlab is slightly different) looks like this:
Oof. The fonts are practically not readable (I guess this is related to my screen resolution) and the lines are thin. Granted, these are the defaults, but even the pgfplots defaults are far better than this. Of course you can fix these things also in matlab/octave, but my experience shows that many researchers include barely-readable figures in their papers, matlab/octave being one of the culprits.
Let's look at the same data, this time with our newly-configured pgfplots:
\begin{tikzpicture}
\begin{axis}
[
xlabel = {t (s)},
ylabel = {y (V)}
]
\addplot table {data.dat};
\end{axis}
\end{tikzpicture}
This yields:
Much better. The line widths are nice, the fonts match (well, not on the website, but in the document) and are readable.
Now let's plot more data.
The data file contains three traces (including the time vector it contains four columns).
We can plot all three of these by explicitly stating which one to use for the \addplot table
command.
In this example I use the x =
and y =
keys to select the data.
This works as the first line in the data files defines these names.
If this is not present, one can also use x index =
and y index =
.
Please refer to the pgfplots manual for further information.
\begin{tikzpicture}
\begin{axis}
[
xlabel = {t (s)},
ylabel = {y (V)}
]
\addplot table[x = t, y = y] {data.dat};
\addplot table[x = t, y = yth] {data.dat};
\end{axis}
\end{tikzpicture}
With this, the modified plot looks like this:
Neat! Note that I did not modify any line styles here. This relies on the cycle list to provide the correct styles. This means that any changes in the cycle list can change all the plots in the entire document, which is a good feature.
In the above example, there are two signals in one plot. This means that there should be a way for the reader to distinguish them by some means. Typically one would use a legend. In this example the two signals are tightly related (the solid line is the dashed signal processed by a track-and-hold circuit). Therefore I would label these signals in the figure caption and in the text, not in the plot, as for someone reading the related paper/article/presentation the difference would be clear. However, in order to show this I will add a legend to the plot. Additionally, also some minor ticks are added:
\begin{tikzpicture}
\begin{axis}
[
xlabel = {t (s)},
ylabel = {y (V)},
minor x tick num = 4,
minor y tick num = 1,
overpainting legend
]
\addplot table[x = t, y = yth] {data.dat};
\addlegendentry{yth};
\addplot table[x = t, y = y] {data.dat};
\addlegendentry{y};
\end{axis}
\end{tikzpicture}
This makes for a nice, readable plot. The legend obscures the plots a bit, but in this case I think that is better than increasing the y-axis range.
In the above example, the data file has a third signal, which is a bit different than the other two. Therefore, it would be nice to put it in an extra plot, just like with subplots in matlab/octave. Here, the following is a typical code to produce subplots:
data = dlmread('data.dat');
figure;
subplot(2, 1, 1);
plot(data(:, 1), data(:, 2), data(:, 1), data(:, 3));
grid on;
xlabel('t (s)');
ylabel('y (V)');
subplot(2, 1, 1);
plot(data(:, 1), data(:, 4));
grid on;
xlabel('t (s)');
ylabel('vctrl (V)');
In pgfplots, there are several ways to achieve this.
There is direct support for this in form of the groupplots
library, but I don't use that as I found that I often could not really achieve what I wanted in a simple way.
This may have changed as it has been a while since I last looked at that library, however, I will be showing a more basic approach.
The key is to put several axis environments in one tikzpicture. This allows placement of the axis in a tikz manner (as named nodes, relative to each other) and it also enables drawing from one axis into the other in an easy way (not needed very often, but still nice).
An axis is like a node in that it has anchors that can be used for placement.
This means that if we want to put an axis on top of another, we can place its south
anchor on the north
anchor of the lower axis.
Tikz (and this also then works with axis environments) allows this like this:
\node[anchor = south, at = (otheraxis.north)] ...
where otheraxis
is an already-existing axis that we named like this (see the key name
).
But in this case there is also a simpler way:
I typically use the property that all axis are (per default) placed at the same point.
I guess it is (0, 0)
, but it does not really matter, since after all drawing the bounding box is established around the drawn content, regardless if all coordinates have an offset of, say, 1000 or not.
This does not affect the final result.
Therefore, in order to place two axis on top of each other, I use the south
anchor for the upper one and the north
for the lower one.
The replication of the above matlab/octave example in pgfplots then looks like this:
\begin{tikzpicture}
\begin{axis}
[
anchor = south,
ylabel = {$y$ (V)},
clear xticklabels,
minor x tick num = 4,
minor y tick num = 1,
overpainting legend
]
\addplot table[x = t, y = yth] {data.dat};
\addlegendentry{yth}
\addplot table[x = t, y = y] {data.dat};
\addlegendentry{y}
\end{axis}
\begin{axis}
[
anchor = north,
yshift = -4pt,
xlabel = {$x$ (t)},
ylabel = {$v_{\mathrm{ctrl}}$ (V)},
minor x tick num = 4,
minor y tick num = 1,
]
\addplot table[x = t, y = vctrl] {data.dat};
\end{axis}
\end{tikzpicture}
This produces the following figure:
Note that I shifted the axis a bit:
In the local settings of the lower axis I added yshift = -4pt
, which separates the axes from each other.
Without this, the lower and upper y-line lie directly on top of each other.
Somethimes this can be a desired result, it is not in this case.
When working with complex data it can be helpful to point out certain points, highlight specific parts of the plot or add custom textual annotations. Since pgfplots is based on pgf/tikz it (obviously) helps to be well-versed in tikz for these kind of things. However, some things come up regularly, so I'm going to show a few techniques and styles to ease these tasks.
A common annotation in plots is highlighting certain areas or values with arrows and text.
I frequently encounter situations where I have a lot of data (e.g. output of a simulation), which might be either too dense and not all the information is needed or where there are so many points that it exceeds TeX's capabilities. The latter case can be remedied by using lualatex, but even then a large number of samples can become quite unpractical. Therefore, a filtering of the data can be very effective. As I will show these filtering techniques can also be used for other things like resizing a data range.
Data scaling is something that I probably use for almost every plot I do.
In the simplest case I just want to display the data with certain units.
For instance when showing signals that vary with time I want to display the time axis in nanoseconds.
Now I could scale the data with something like awk
or do it in matlab/octave, but pgfplots offers a very simple way out of the box.
It is possible to employ a so-called filter expression
for the individual coordinates.
With these, every coordinate is filtered by an expression and replaced by the result of that expression.
Therefore, to display x-coordinates in nanoseconds (assuming they are saved as seconds) I can just multiply every coordinate with 1 to the power of 9:
x filter/.expression = {x * 1e9}
I think that's pretty simple and handy. The filter expression can also by used to add or subtract a constant, which can be useful for instance for plotting several traces in one plot but separating them in the y-domain. Another possible use-case that I had before is to rescale the data to a certain range (for instance digital signals between VSS and VDD, which should be displayed between 0 and 1).
While I believe that using pgfplots for creating beautiful graphs has a much flatter learning curve than tikz, sometimes it still is handy to apply some knowledge of the latter. Since this guide is meant to show the use of pgfplots for people not well versed in tikz, I will try to cover the most common pitfalls in this section.
In this section I will show a few examples that are complete enough so that the code can just be copied and modified in order to obtain a similar plot with different data.