October 8, 2023
My occupation as a researcher involves writing scientific papers for conferences and journals, which usually are very constrained in the available space for content. The conferences for analog integrated circuit design (my line of work) almost exclusively use the standard IEEE format, which is four pages with two columns (some conferences allow a fifth page solely for references). As this is not a lot of space for complex system explanations, typically there is too much content, requiring some work to fit everything needed. In this article I want to show some techniques for optimizing the content in order to maximize the information density. The techniques described are mostly LaTeX-based, but some approaches work in any word processor.
To give an overview, here is the first page of my paper I've submitted to ISCAS 2023:
The first page has a large title, an author section, an abstract and then the content, which is usually divided into sections with headings. The other pages follow this format (excluding the title, the author section and the abstract, of course). The content will also include figures and equations, which is important as these elements influence spacing in a different way compared to regular text.
My writing process typically involves creating a first draft with a lot of preliminary material, which does not adhere to page limits. In this stage, I try to produce too much content to fit on four pages so I can then optimize by removing unneeded material. The optimization strategies demonstrated in this article are focused on actual content, not cheating by modifying spaces between figures and text, changing font sizes, margins etc. I won't be talking about these in this article. I strongly believe that optimizing the actual content, not the markup helps in creating a concise and well-structured paper with a high information density.
When the content barely fits the available space, gaining merely a few lines can make the difference between a finished paper and one that can't be submitted. On this level of optimization, it is crucial to roughly understand how the content is placed on the pages. As I'm using LaTeX to write papers, I will focus on that process. Therefore, before we talk about optimization strategies, let's delve into the way LaTeX breaks content into paragraphs, pages etc.
In order to properly place (and space) content on a page, LaTeX finds optimal points for line breaks in paragraphs, determines which lines go where on a page, finds the best place for a figure (often a source of frustration for new users) and so on. Typically, paragraphs are set in a justified style, that means that the left and right borders are both straight (as opposed to ragged-left or ragged-right paragraphs). In order to achieve this, the spaces between the words can be stretched and words can be hyphenated. Stretchable spaces are an important concept in LaTeX, as they are used virtually everywhere: between words, paragraphs, paragraphs and section headings, main content and figure/table captions and so on. In later sections, making sure the spaces shrink to their minimum (or close) will be – indirectly – one part of content optimization.
The placement of the content works by putting everything in boxes and then arranging the boxes in the best way. For this, multiple iterations are tried until the best one is found. Which solution is the best can be influenced by various parameters (which make one solution more or less likely). As this article is about a specific paper format (and usually the defaults lead to visually pleasing results), tweaking of these parameters is not discussed. I only mention this because this knowledge helps to understand why a certain line/page break is happening, which is useful for optimizing the content.
LaTeX will try very hard to ensure that certain things don't happen: Section headings will not be placed without following lines, a paragraph will not be broken across a page if it produces a single line on a page, captions always stay at their respective figures or tables. This means that sometimes removing one line on a page can shift the following content by more than this, possibly a section heading plus two or more lines.
For debugging possible placement improvements, I compile my documents with the preamble switch \raggedbottom
.
This suppresses the automatic increasing of space between vertical elements (figures, paragraphs, formulas etc.).
Per default, the space between these elements is enlarged if the page is not quite full in order to align the last element with the bottom of the page.
The switch disables this, which helps finding possible points for improvement.
This debugging method is more effective when the the frame of the document is shown.
This highlights the boundary of the text area (the place where the content is placed, without any margins).
This can help to find pages that are not entirely full.
In order to view the frame, use the geometry
package with the options frame, pass
.
This will draw the frame without changing any page dimensions.
The following example shows the mentioned debugging techniques.
The content of the paper is shown in an abstract way, as this is independent of the actual content (also, reduces the file size by 90 percent).
The image on the left shows a page with acceptable typographical quality, although the space between the sections on the right column (the smaller rectangle in the middle is a section heading) is quite large.
Turning on the debugging switches reveals why: There is room for two more lines on the bottom of the right columns that is not used.
This is because following the last paragraph of this page is a new section.
As the section heading can't be separated from the following lines and single lines are unlikely to appear, this is the better solution, at least in terms of the LaTeX placement algorithm (and corresponding parameters such as the clubpenalty
).
These debugging techniques will be used in the following sections in order to find possible spots for easy improvement. Next, we will delve into actual content optimization.
The LaTeX package microtype
introduces micro-typographic modifications (hence the name).
Among the modifications are character protrusion (extend some characters like periods slightly into the margin) and font expansion (local use of wider or narrower font).
These modifications usually lead to a more visually pleasing result and (as a side effect) are quite effective in reducing occupied space in a short paper.
For example: In the shown paper, compiling without microtype
leads to four references ending up on the fifth page, an equivalent a medium-size paragraph.
This means that when there are only a few lines that need to fit on the last page then just including microtype
can already fix these spacing issues.
The following example shows the effect of the microtype
package.
It is an exert of the camera-ready paper I've presented at ISCAS 2023 (page 2, bottom).
The left image shows the paper if it had been compiled without microtype
, the right image includes the package.
With microtype
, a new paragraph is already started on this page.
Compare the location of the sentence "Clock crossing issued in the proposed design (...)", which is on the last line in the left picture and on the fifth-last line in the right picture.
So solely using microtype
already saves four lines!
(There are two typos in that sentence, but I decided to include this exert as published. The correct sentence is "Clock issues in the proposed design were checked (...)".)
The shown images depict the second page, but the saving effect due to the microtype package becomes more pronounced throughout the paper, as these savings accumulate.
The version without microtype
puts seven lines and one section heading on the fifth page, whereas the paper with microtype fits on four pages.
This means that for content that almost fits the available space using microtype can already be enough.
This does not change any content, so this is no content-versus-space compromise.
For this (and of course the typographical improvements), microtype
should always be included.
Figures play an important role in scientific papers. The occupied space can be very different, but in a two-column layout there is no text flowing around figures, which means that they break the text flow. This means that any space left and right of the figure is wasted. Therefore, pictures should be optimized for minimum height and maximum width. Furthermore, the bounding box of a picture is rectangular. For this, the content of the picture should be as rectangular as possible. For instance, the next figure shows an exert from my paper for ISCAS 2021. The first image places the 'TX' and 'RX' labels outside of the boxes. In the second image, they are placed in the boxes, saving some space. (Note that I used the first version in the paper, because I had enough room. Furthermore, for this to work I believe that the labels should be emphasized in a way for them to stand out of the rest of the annotations.)
Optimizing the bounding box of pictures can often be done without compromising the content. For example, with data plots that have long labels for the y-coordinate, breaking that label into multiple lines can again achieve a few lines of space. The following example is again a figure from my 2023 ISCAS paper, where the noise plot has a rather long y-label. In the final paper I've included a line break in the label. Removing that break leads to the entire paper content not fitting onto five pages.
As you can see, the y-label extends above the upper axis line, which increases the bounding box of the figure. (This is not the case at the lower side, as there the height is set by the label for x-axis).
Besides making sure that the bounding box is optimal, a further optimization for figures of course deals with the content.
In my opinion, most images can be shrunk a little bit by placing some things closer together, retain only the vital information etc.
Note that I don't recommend meddling with the font size.
While I generously use \small
for font sizes in my figures, I only rarely use \scriptsize
.
As figures are meant to be read and understood, using small font sizes should generally be avoided.
Optimizing figure height without removing content is most easily applied to plots.
Here, the height is arbitrary.
It should be large enough so that everything can be presented in a readable way, but it has no benefits of using larger heights.
Generally I recommend that one uses the same height for all plots in a paper.
Of course there are special plots where you might want to use a different height, but for all other typical plots, use the same height.
I create all my plots with pgfplots
, which makes it especially easy to optimize plot height.
In my document preamble I have the following:
\pgfplotsset{
every axis/.append style = {
height = 3cm,
},
}
This lets me set the height of all plots in the document (at least don't that specifically overwrite this) with one single line. With this, it is trivial to optimize the content of the paper: Reduce the value of the height until everything fits in the paper – as long as this does not produce plots that are unreadable.
At the lowest level of typographical structure (chapters, sections, etc.) there are paragraphs. A paragraph makes up for one thought. Just like you should not write about – say – seals in a section about bears you should not combine several thoughts in one paragraph. Typically I find this a bit harder to achieve than partitioning structure into sections, but good paragraph design is important and should be valued. In a well-structured text you should be able to find something by skimming through the paragraphs and skipping to the next whenever the thing you search is not mentioned (without reading the entire paragraph).
A paragraph is broken into lines and therefore occupies several lines on a paper. Typically, the text is set justified, so there are neither spaces on the left end nor the right end of a line. There are two exceptions: the first line is often indented to mark a paragraph, the last line is not necessarily full and therefore ends with a space. This is where the optimization comes in. The goal is to find last lines that are almost empty. Then the paragraph is re-written in order to save a few words and with that – a line.
In order to do this systematically, first one has to find paragraphs that actually matter. As was explained earlier, not all line savings actually change anything in the layout except for the vertical spacing between elements. Turning on debugging switches helps to find lines worth optimizing. In my experience, most lines can impact layout. However, critical ones are often at the end of a page (where the next paragraph can't start on that page any more), especially if they are immediately before a section. Before re-writing a paragraph, I recommend to comment-out that line to see if it makes a difference. If it does, start re-writing.
Re-writing paragraphs for line saving should be done backwards. This is because changing stuff at the beginning often does not have an impact on the end of a paragraph (unless, of course, many words are removed). One should look at the last two or three sentences, typically this is where the most can be improved. If this does not help then consider moving even further through the paragraph.
When re-writing, look for redundant words, expressions and complicated constructions involving passives, which are often things that can be phrased in a more concise and space-efficient way. The nice side effect is that this can also improve your writing. Scientific writing should use short, clear sentences without any complex constructions as they only obstruct the understanding. When writing a text, one should not consider any paragraph finished. I'm often surprised of how much can be optimized in paragraphs that I have written weeks ago (since I consider them "stable" and well-versed).
The following example shows the abstract of my 2023 ISCAS paper. The left paragraph is the version I used for the blind submission, the right one is the final version. I've marked the words that I changed in the final version of the abstract. While the content of both versions is basically the same, the re-phrasing on the right saved one line. This might not seem much, but it enables two lines in the main body to move from the second page to the first. With the complex placement of the entire content, this makes a big difference.
In this paper, a novel charge pump for sub-sampling phase-locked loops (SSPLLs) is presented. Contrary to the traditional charge pump, the proposed implementation eliminates the previously-required pulser. This is achieved by using all sample data from the ping-pong sub-sampling phase detector as opposed to only every second point, which enables the charge pump to run pseudo-continuous. This virtually raises the reference frequency by a factor of two, which is beneficial for the phase noise performance of the phase-locked loop while fulfilling the requirements for bandwidth of reference buffers, switches etc. Furthermore, eliminating the pulser enables a highly power-efficient charge pump design, leveraging even higher SSPLL figure-of-merit. The proposed charge pump is implemented in a 22-nm fully-depleted silicon-on-insulator technology. While maintaining the effective gain, noise and offset performance of the reference design, the power and area consumption is reduced by roughly 80 % and 55 %, respectively.
In this paper, a novel charge pump for sub-sampling phase-locked loops (SSPLLs) is presented. Contrary to the conventional charge pump, the proposed implementation eliminates the previously-required pulser. This is achieved by using all sample data from the ping-pong sub-sampling phase detector as opposed to only every second point, which enables the charge pump to run pseudo-continuous. This virtually raises the reference frequency by a factor of two, which is beneficial for the phase noise performance of the phase-locked loop while fulfilling the requirements for bandwidth of reference buffers, switches etc. Furthermore, eliminating the pulser enables a highly power-efficient charge pump design, leveraging higher SSPLL FoM. The proposed charge pump is implemented in a 22-nm fully-depleted silicon-on-insulator technology. The power and area consumption are reduced by roughly 80 % and 55 %, with similar effective gain, noise and offset performance compared to the conventional design.
One will notice (if observed carefully) that I try to use this throughout my papers. The final version of the ISCAS 2023 paper does not have a single paragraph with a last line that is half-full or less. On the contrary, many paragraphs have very full last lines, with some even entirely full (second page, fifth paragraph; fourth page, first paragraph; fourth page, third paragraph). My ICECS 2022 paper had even more space issues, resulting in seven full-line paragraphs. These optimizations are difficult and took me quite some time, but I had to fit in a lot of content.
The text of a paper makes up for a significant amount of the content. It gets broken into lines by the line-breaking algorithm, which we will have a deeper look further on. The individual lines then are put into a paragraph, which is then placed on a page. This involves another algorithm, which also influences how other stuff like figures, section headings etc. are placed. In order to optimize the content, it is crucial to understand how these parts play together.
Paragraphs in LaTeX can be broken across columns and pages, but it is considered non-pleasing to have a single line of a paragraph by itself. This means that the placement considers this and only creates such layouts when the alternative is even worse. In my experience, in well-formed papers (where there is no almost-empty page), this rarely happens. This in turn means that even if there was room for another line at the bottom of a column, it would not be placed there as this would result in a singular line in the next column. Effectively, two lines are then placed in the next column, wasting an entire line of space in the previous column. A similar effect can occur with section headings: These are not placed at the end of a column without any following text, so they get pushed to the next column, again wasting space. All these intricate details of the line-breaking algorithm in TeX mean that saving a line on an early page can save many lines in later pages. This is important to understand to truly optimize the content of a short document as a paper, where space is very constrained.
This article shows various techniques for optimizing content in short documents, where available area is a precious resource.
The shown methods are simple to apply and some don't even affect the actual content (as opposed to removing stuff).
Furthermore some debugging methods where shown that can reveal wasted space.
In conclusion: turn on \raggedbottom
and use \usepackage[pass, showframe]{geometry}
.
For optimizing content, use the microtype
package and make sure you find the right height for your figures (and only those that matter!).
As last resort, re-write some paragraphs to save some lines.
The following example shows all managed versions of the paper I submitted to ISCAS 2023 (see here for the actual paper). It starts with my initial draft, shows the iterations before the first blind submission up to the final camera-ready paper. I've omitted the fifth page, which only contains references. The buttons can be used to navigate through the version history.
There are some interesting things happening here. Figure 5 (referring to the final submission), which shows the simulated supply current showed a negative current (due to the way I simulated it). While this does not change the message, it makes much more sense to show its absolute value. This went unnoticed for a long time, but one of my colleagues pointed this out, which is why this changed after I got some comments on the paper. Another interesting thing is figure 2 and 3 (again referring to the final version). The reviewers requested that the implementation of the conventional charge pump is shown (figure 2) as well as the related signals, for both the conventional and the proposed circuit. This took me some time, as adding a entire figure is quite heavy on the content, furthermore figure 3 was already (at least I then thought so) quite dense. Still, I managed to fit both the new figure and the signals in. Notice how figure 3 changes from version 37 to 38: The devices are closer together and there is room for the signals on the right. This definitely makes it a better paper, demonstrating my points from above about figure optimization.
So how did I fit an entire new figure into the paper, without giving up on content? Look at the first page (version 36 and 37): In version 36, the second paragraph of section I has two lines in the right column. For one of them it is possible to be placed in the left column, but the LaTeX algorithm does not do this (as explained before). The last line of the abstract is barely half-full, which made me re-write it. This in turn made it possible to fit the entire second paragraph of section I into the left column. This change in turn then puts two more lines of the first paragraph of section II on this page. Besides this, I reduced the plot height, which (as there are three plots) saves a considerable amount of space. These changes enable that the beginning of section III (Noise Analysis) is actually shifted up, even though there is a new figure. With the smaller plots this means that the sections after III don't really need to be changed, as there is already enough room for them.
Other things worth noticing are that I did not send a version that could be submitted to my colleagues (as can be seen from the missing acknowledgements, which also have to fit on the fourth page). This is something I usually try to avoid (sending unfinished papers to colleagues), but in this case there was only one day left, so I was just interested in major issues. Furthermore, in the first versions one can see that I just put in any vague ideas that I have. There are claims without any references, unfinished sentences, unimportant figures and so on.
Lastly, while skipping through versions 2 to 31 (which mostly have adapted labels as they are just following my commit history without any real meaning), one can notice how I juggle some thoughts. For instance, some figures appear and later disappear again (such as the plots showing the sampled signals) or figures get incorporated into other ones (the pulser, only existent as stand-alone figure in versions 17 to 22). Other figures are added early at the beginning as placeholders and reminders for myself, when I did not even have the data for them (gain variation was present from version 11, but the data was only added in version 24). Additionally, going from version 36 to 37, I added an entire new figure (as per suggestion of one reviewer), which takes up a lot of space. In this version jump a lot o re-writing goes on in order to fit the new figure. Lastly, in the final version I also added some signals to the main circuit architecture, which also took some work on re-positioning the circuit elements. I did not really deem this possible, but the end result is quite good, in my opinion. This shows how far content optimization can go, squeezing in more and more information.
|
|
|
|