At surface level, this seems like a simple issue task. But there’s actually a lot more to it than meets the eye.
Keep in mind
Generating a pdf file on the server and returning it to the client will always always take time. Any hosting service you choose, whether AWS, heruko , vercel etc. will limit your execution to a time frame.
Why not generate the PDF on the client via window.print()
?
Client side pdf generation is good, but it. has a multitude of issues:
- It assumes print by default: You may want a different set-up for your pdf than your print view.
- It relies on the client’s browser, device etc. Which are too many to account for.
- The client might have dark mode turned on. By default, the browser takes the theme of the OS. This is actually.
Sumamry:
-
Install the following:
chrome-aws-lambda@6 cors@2.8.5 express@4.18.1 puppeteer-core@6
-
Implement pdf generation as you require.
-
Upgrade your vercel membership to premium.
The challenge:
converting a webpage (or part thereof) to html is hefty working the server resources.
So there are limitations which you will likely run into when attempting to do so with puppeteer:
- Vercel imposes a 50mb serverless function size. This limit is also imposed upon vercel themselves by aws.
- The path to the chromium executable.
- The serverless function timeout: Vercel imposes a 5 seconds timeout to serverless functions on the free tier. This isn't enough and this is exactly why you will probably need to upgrade to the membership.
Server
The API interface can either be a basic express server, or a nextjs serverless function.
In both cases we will expose one GET route.
app.get("/api", async (req, res) => {});
Within this route, we will try and launch a browser instance as a background process :
chrome = require("chrome-aws-lambda");
puppeteer = require("puppeteer-core");
let browser = await puppeteer.launch(options);
Options is your configuration object, which will customize the browser instance:
options = {
args: [
...chrome.args,
"--hide-scrollbars",
"--disable-web-security",
"--font-render-hinting=none",
],
defaultViewport: chrome.defaultViewport,
executablePath: await chrome.executablePath,
headless: true,
ignoreHTTPSErrors: true,
};
Once the browser instance has been created, we’ll navigate to the desired page
const page = await browser.newPage();
await page.goto(url, {
waitUntil: "networkidle0",
timeout: 0,
});
``
networkidle0
is the most reliable indicator of ready navigation since it waits 500ms after the last network request.
This ensure your page won’t have any missing content prior to making any further operations.
Once that’s done, we’re going to create a pdf of the webpage
const pdfBuffer = await page.pdf(pdf_settings);
Your PDF configuration object can be any configuration theme you desire. In my case it looks something like this:
const pdf_settings = {
format: "a4",
printBackground: true,
margin: {
top: "0px",
right: "0px",
bottom: "0px",
left: "0px",
},
scale: 1,
};
The page.pdf
method returns a pdf buffer data which look something like this
%PDF-1.3
%����
8 0 obj
<<
/Type /ExtGState
/ca 1
>>
endobj
11 0 obj
<<
/Type /ExtGState
/ca 0.4
>>
Ammar Halees
At this point, you have the choice on how you send this data back to your client.
This is what i do:
res.setHeader("access-control-allow-origin", OriginURL);
res.setHeader("access-control-request-method", "GET");
res.setHeader("Content-Type", "application/pdf");
res.setHeader("Content-Disposition", `attachment; filename=${title}.pdf`);
Where OriginURL
is your domain. And title
will be the default suggested document name by your browser.
If you want more robust CORS handling, you can create middlewear using the cors
package we’ve installed.
Last but not least, we start our server:
app.listen(process.env.PORT || 3001, () => {
console.log("Server started");
});
Client
On the client, we are receiving a “file” . To download that file, we simply use the “download” attribute with the anchor tag:
<a href="/api/pdf" download>
By not providing a name and extension, the will be inferred by the browser.
We can also specify the default name and extension for the downloaded file:
<a href="/api/pdf" download="filename.png">
This is one approach. Another approach if you want more nitty gritty control on the network request and its timings is to do create the anchor tag on the fly, use its download functionality and then then destroy it.
As such:
const handleDownload = async () => {
setisDownloadLoading(true);
const res = await fetch(`/api/pdf`, {
method: "GET",
headers: {
Accept: "application/pdf",
},
});
// if (!res) throw new Error("no data");
const blobcontent = await res.arrayBuffer();
// if (!blobcontent) throw new Error("no data");
const downloadFile = () => {
try {
const link = document.createElement("a");
// create a blobURI pointing to our Blob
const data = Buffer.from(blobcontent).toString("base64");
// if (!data) throw new Error("no data");
const uri = "data:application/pdf;base64," + data;
link.href = uri;
link.download = `${resumeObject?.name}.pdf`;
// some browser needs the anchor to be in the doc
document.body.append(link);
link.click();
link.remove();
setisDownloadLoading(false);
// in case the Blob uses a lot of memory
setTimeout(() => URL.revokeObjectURL(link.href), 7000);
} catch (e) {
console.log(e);
setisDownloadLoading(false);
window.print();
}
};
downloadFile();
};
In the above example, we are using the arrayBuffer method of the fetch response object to extract the pdf buffer data.
const blobcontent = await res.arrayBuffer();
This buffer data is actually raw and cannot be displayed as is. It’s what’s called a “blob”.
Blob data on its own cannot be interpretted by the anchor tag’s href attribute.
So what we do is that we need to convert this Blob data into something called a “Data URL”.
Data URLs are interpreted by the href attribute. But they can be verbose and massive.
A Data URI has the following format
data:[<mediatype>][;base64],<data>
Where data in our case will be base64 encoded represenatition of the array buffer data.
We can achieve 64-bit encoding of buffer data by using the form()
method of the Buffer object, and then sequentially using the toString
method and passing “base64”
to it.
const data = Buffer.from(blobcontent).toString("base64");
``
<mediatype>
in our case will be application/pdf .
const uri = "data:application/pdf;base64," + data;
The rest of the code will be appending and revoking the anchor item from the DOM.
That’s it.