Use a robots.txt Disallow directive to tell crawlers not to visit your PDFs.This combines a few individual techniques: How it works: Minimize search crawlers’ ability to find URLs for PDFs and block them from visiting the URLs they do find. The next option is to obscure your PDFs from Google’s gaze in the hopes that it will de-rank PDFs in search results. The no-robots method: hide your PDF URLs from Google (if you can) ] for = "*.pdf" X-Robots-Tag = "noindex"įurther reading: To learn more about how Google interprets this method, read Google’s Block Search Indexing with ‘noindex’ docs. Serve the PDFs with the following HTTP header: If you can’t change your server’s headers, then you’ll need to use another method.Įxample: Suppose you serve many PDFs at addresses starting with and you want to remove them from search results. If you have many PDFs and it’s impractical to figure out a canonical URL for each, then dropping them from search might be a practical solution.ĭisadvantages: Like the canonical method, this method assumes you have control over your web server or site hosting configuration (though it will probably be less fussy for bulk use). This method also works well when you need a one-size-fits-all fix for PDFs in search results. It works well in situations where you’re continuing to serve a PDF with no web-friendly alternative and you don’t want new readers to stumble upon the PDFs. This is like using a tag in an HTML file, but for non-HTML files.Īdvantages: This method usually removes a PDF from search results. How it works: Set the X-Robots-Tag header with the noindex value in responses to requests for PDFs. This method is like the canonical method, except that it drops a URL from search indexes instead of consolidating it with another URL. The next best option is to tell search engines to explicitly ignore your PDFs with the noindex header value. The noindex method: tell Google to ignore your PDFs ] for = "/assets/user-guide-v3.2.1.pdf" Link = ' rel="canonical"'įurther reading: To learn more about how Google interprets this method, read Google’s Consolidate duplicate URLs docs. Serve the PDF with the following HTTP header: If you can’t satisfy both of these requirements, then you’ll need to use another method.Įxample: Suppose you serve a PDF at, which is a print alternative to the web content at. In its index of sites, Google consolidates the two URLs and prefers to show the canonical URL in search results.ĭisadvantages: This method assumes that there’s a single, preferred alternative to your PDF (such as a web-friendly page or a PDF gateway page) and that you have control over your web server or site hosting configuration. When you serve a PDF with a canonical URL, the PDF URL’s ranking is added to the canonical URL’s ranking. It’s like using a tag in an HTML file, but for non-HTML files.Īdvantages: This method usually preserves or strengthens your search results ranking. How it works: Set the Link header with the rel="canonical" parameter in responses to requests for PDFs. The first and best method is to tell search crawlers that there’s another, better URL for search results: the canonical link. The canonical method: tell Google to prefer web pages over PDFs And, whichever method you choose, don’t miss the general tips at the end. Read the following sections to learn when each method works best and how to use it. The password method: password protect the PDFs themselves, so Google ignores them.The no-robots method: try to hide a URL from Google and other web crawlers using changes to your robots.txt, sitemap, and links.The noindex method: tell search engines to ignore a PDF URL using an HTTP response header.The canonical method: tell search engines to treat a PDF URL as equivalent to another URL using a special HTTP response header.There are at least four methods that can help: It’s a little bit of search engine optimization, but not the under-handed kind. ![]() You can’t manipulate search results directly, but it’s possible to give hints to Google and other search engines to stop them from driving traffic to your PDFs. Wouldn’t it be nice to continue to offer PDFs without compromising search results? It’s SEO, but not the under-handed kind ![]() ![]() Maybe you need a print-only version of your docs or you can’t yet produce better offline formats (such as EPUB). It funnels traffic to a format that is “unfit for human consumption” on the web.īut you might not be able to get rid of those PDFs just yet. When a search engine results page ranks your PDFs highly, it draws traffic away from your freshest, most-accessible documentation. If you offer documentation as both web pages and as PDFs, then there’s an annoying consequence for readers who search for your docs: Google might rank your PDFs as high or higher than your regular web pages. 4 ways to keep PDF docs off Google search results
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |