BASH Script to generate PDF of Source Code with Syntax Highlighting using LaTeX

Sometimes when I'm learning I like to print source code on paper because I find it easier to read, and nicer to annotate.

I had a rummage online to see if anyone had come up with a nice way to generate PDFs of source code, and improved what I found into this useful BASH script. The script searches for source code in the current directory and its subdirectories, and uses the typesetting software LaTeX to create a PDF of the contents with syntax highlighting.

Summary and Defaults

The script is interactive, and will prompt you on the following points:

  1. Choice of document title (for the front page). The default is to use the name of the current directory.
  2. Choice of file extensions (defaults to .h, .cpp and .qml files). This is useful for omitting boilerplate stuff you're not interested in like Makefiles, and also helps reduce the chances of accidentally including binary files (which would cause LaTeX to throw a load of errors). You should provide a list of one or more extensions, with each item in the list separated by a single space.
  3. Choice of whether to place .h files in front of .cpp files (defaults to yes). The source files are sorted alphabetically before inclusion into the PDF, which means that without this option header files appear after cpp files, which is inconvenient.
  4. Option to review the files that will be included, and exit cleanly if there is a problem with the list.

The script then creates a latex source file by printing a header into the temporary .tex file (which determines the document's formatting) followed by the contents of each source file in a new section, so that each file starts on a new page with a title. The source code itself is contained in LaTeX listings, which enable syntax highlighting.

Installation

If you don't have LaTeX installed, you must install it. On Debian derivatives like Ubuntu, Raspbian:

sudo apt-get update
sudo apt-get install texlive-latex-base texlive-latex-extra

You can choose any name you like for the script, but I named it src2pdf.

If you don't have a bin subdirectory for scripts in your home directory, create one:

mkdir ~/bin

Create a file and copy and paste the source (CTRL+SHIFT+V to paste in most terminal emulators):

nano ~/bin/src2pdf

Make the script executable:

chmod +x ~/bin/src2pdf

If you want to add the bin directory to your path (so you can type src2pdf instead of ~/bin/src2pdf), append this line to your ~/.bashrc:

PATH=$PATH:~/bin

And reload the settings:

source ~/.bashrc

The Script

To run the script, just change directory into the folder where the source code is found and then run the src2pdf command.

#!/usr/bin/env bash

# CREATE PDF FROM SOURCE CODE
# original source http://superuser.com/questions/601198/how-can-i-automatically-convert-all-source-code-files-in-a-folder-recursively
# source code file names must not contain spaces

read -p "Please type the document title (blank to use ${PWD##*/}) : " answer

if [[ $answer == "" ]]; then
    title=${PWD##*/}
else
    title=$answer
fi


# if output files already exist, delete them
if [ -f ./tmp.aux ] || [ -f ./tmp.log ] || [ -f ./tmp.out ] || [ -f ./tmp.pdf ] || [ -f ./tmp.toc ] ; then
    echo "Removing old output files..."
    rm ./tmp.*
fi

tex_file=$(mktemp) ## Random temp file name

if [ $? -ne 0 ]; then
    echo "ERROR: failed to create temporary file"
    exit 1;
fi

# DOCUMENT HEADER

cat<<EOF >$tex_file   ## Print the tex file header
\batchmode
\documentclass[titlepage,twoside]{article}

%\usepackage{showframe}
%\usepackage[inner=2cm,outer=4cm]{geometry}
%\usepackage[]{geometry}
\usepackage[inner=2.5cm,outer=2.5cm,bottom=2.5cm]{geometry}


\usepackage{listings}
\usepackage[usenames,dvipsnames]{color}  %% Allow color names
\lstdefinestyle{customasm}{
  belowcaptionskip=1\baselineskip,
  xleftmargin=\parindent,
  language=C++,   %% Change this to whatever you write in
  breaklines=true, %% Wrap long lines
  basicstyle=\footnotesize\ttfamily,
  commentstyle=\itshape\color{Gray},
  stringstyle=\color{Black},
  keywordstyle=\bfseries\color{OliveGreen},
  identifierstyle=\color{blue},
  %xleftmargin=-8em,
}        

\usepackage[colorlinks=true,linkcolor=blue]{hyperref} 

\begin{document}

\title{$title}
\author{Sam Hobbs}
\maketitle

\pagenumbering{roman}
\tableofcontents

\newpage
\setcounter{page}{1}
\pagenumbering{arabic}
EOF

###############

# ask the user which file extensions to include

read -p "Provide a space separated list of extensions to include (default is 'h cpp qml') : " answer

if [[ $answer == "" ]]; then
    answer="h cpp qml"
fi

# replace spaces with double escaped pipe using substring replacement  http://www.tldp.org/LDP/abs/html/parameter-substitution.html

extensions="${answer// /\\|}"

###############

# FINDING FILES TO INCLUDE
# inline comments http://stackoverflow.com/questions/2524367/inline-comments-for-bash#2524617
# not all of the conditions below are necessary now that the regex for c++ files has been added, but they don't harm

filesarray=(
$(
find .                                          `# find files in the current directory` \
        -type f                                 `# must be regular files` \
        -regex ".*\.\($extensions\)"            `# only files with the chosen extensions (.h, .cpp and .qml) by default` \
        ! -regex ".*/\..*"                      `# exclude hidden directories - anything slash dot anything (Emacs regex on whole path https://www.emacswiki.org/emacs/RegularExpression)` \
        ! -name ".*"                            `# not hidden files` \
        ! -name "*~"                            `# don't include backup files` \
        ! -name 'src2pdf'                       `# not this file if it's in the current directory`
))

###############

# sort the array https://stackoverflow.com/questions/7442417/how-to-sort-an-array-in-bash#11789688
# internal field separator $IFS https://bash.cyberciti.biz/guide/$IFS

IFS=$'\n' filesarray=($(sort <<<"${filesarray[*]}"))
unset IFS

###############

read -p "Re-order files to place header files in front of cpp files? (y/n) : " answer

if [[ ! $answer == "n" ]] && [[ ! $answer == "N" ]] ; then
    echo "Re-ordering files..."

    # if this element is a .cpp file, check the next element to see if it is a matching .h file
    # if it is, swap the order of the two elements
    re="^(.*)\.cpp$"

    # this element is ${filesarray[$i]}, next element is ${filesarray[$i+1]}
    for (( i=0; i<=$(( ${#filesarray[@]} -1 )); i++ ))
    do 
        # if the element is a .cpp file, check the next element to see if it is a matching .h file
        if [[ ${filesarray[$i]} =~ $re ]]; then
            header=${BASH_REMATCH[1]}
            header+=".h"
            if [[ ${filesarray[$i+1]} == $header ]]; then
                # replace the next element in the array with the current element
                filesarray[$i+1]=${filesarray[$i]}
                # replace the current element in the array with $header
                filesarray[$i]=$header
            fi
        fi
    done
fi

###############

# Change ./foo/bar.src to foo/bar.src
IFS=$'\n' filesarray=($(sed 's/^\..//' <<<"${filesarray[*]}"))
unset IFS

###############


read -p "Review files found? (y/n) : " answer

if [[ $answer == "y" ]] || [[ $answer == "Y" ]] ; then

    echo "The following files will be included in the document..."

    for i in "${filesarray[@]}"
    do
        echo $i
    done

    # allow the user to abort
    read -p "Proceed? (y/n) : " answer
    if [[ $answer == "n" ]] || [[ $answer == "N" ]] ; then
        exit 0
    fi

fi

###############

# create a .tex file with each section on its own page

echo "Creating tex file..."

for i in "${filesarray[@]}"
do
    echo "\newpage" >> $tex_file   # start each section on a new page
    echo "\section{$i}" >> $tex_file  # create a section for each source file
    echo "\lstinputlisting[style=customasm]{$i}" >>$tex_file # place the contents of each file in a listing
done

echo "\end{document}" >> $tex_file

###############

# run pdflatex twice to produce TOC
echo "Creating pdf..."
echo

pdflatex $tex_file -output-directory . 

if [ $? -ne 0 ]; then
    echo "ERROR: pdflatex command failed on first run, refer to tmp.log for more information"
    exit 1;
fi

pdflatex $tex_file -output-directory .

if [ $? -ne 0 ]; then
    echo "ERROR: pdflatex command failed on second run, refer to tmp.log for more information"
    exit 1;
fi

###############

echo "Renaming output files..."

mv tmp.pdf $title.pdf

echo "Cleaning up..."

rm ./tmp.*

echo "Done, output file is $title.pdf in this directory"

If you want to change the appearance of the document (e.g. modify the margin widths), you can modify the LaTeX header in the script.

Questions/comments/improvements? Let me know in the comments.

Type: 

Add new comment