General website audit is a hard and labor intensive task for most people, however such tool as Screaming Frog SEO Spider (SEO Spider) can make it much easier both for amateurs and professionals. User-friendly interface of Screaming Frog provides for quick and easy operation? however multiple configuration and functional options may complicate one’s first experience with the software during the early stages of its use.
Instructions below are aimed at the demonstration of various ways of using Screaming Frog for the website audit, analysis of keywords and competition market, increasing the reference weight and other tasks of SEO, PPC, and other marketing professionals.
Basic Principles of the Website Scanning
How to scan an entire website.
By default, Screaming Frog only scans the subdomain that you visit. Any additional subdomain Spider determines is recognized as an external link. In order to scan additional subdomains, it is necessary to make adjustments in the Spider configuration menu. By choosing an option “Crawl All Subdomains” you can ensure that Spider will analyze all links available on your website’s subdomains.
How to scan a separate directory.
If you want to limit the scanning to a specific folder, just enter the URL and click the “Start” button without adjusting the default parameters. If you made adjustments to the default settings, they can be reset from the “File” menu.
If you want to start the scanning from a specific folder, but then move on to the analysis of the rest of the subdomain, first go to the Spider’s “Configuration” section and select the option “Crawl Outside of Start Folder” before beginning the work.
How to scan a selection of specific subdomains or subdirectories?
To work with a specific selection of subdomains and subdirectories, you can use RegEx to setup include or exclude settings for specific elements in the “Configuration” menu
An example below shows, how all pages of the havaianas.com website except for “About” pages in every separate subdomain were selected for scanning (exclusion). The following example shows how one can only scan subdomain pages of the same website in English language (inclusion).
If it is necessary to scan the list of all pages of my website.
Advice: if you intend to use the configured settings for future scans, Screaming Frog allows you to save the configuration options.
If it is necessary to scan the list of all pages in a specific subdirectory.
If it is necessary to scan a list of domains that your client had just redirected to his commercial website.
Add the URL of such website in ReverseInter.net, then click the link in the upper table to find websites that use the same IP-address, DNS-server or GA-code.
Next you can find all links with the “Visit Website” anchor using the Scraper extension to Google Chrome. If Scraper is already installed, you can launch it by clicking anywhere on the page and selecting the “Scrape Similar” option. In the pop-up window you will be required to replace the XPath request with the following: //a[text()=’visit site’]/@href.
Then click “Scrape” and afterwards “Export to Google Docs”. You will then be able to save the list as a .csv file from the Word document.
After that you will be able to upload this list to Spider and launch the scan. When Spider completes the scan, you’ll see the relevant status in the “Internal” tab, or you can go to “Response Codes” and filter the results using the “Redirection” item to see all domains that were redirected to the commercial website or anywhere else.
Please note, that when uploading .csv files to Screaming Frog you should select the relevant “CSV” file type, otherwise software might produce an error.
Advice: You can also use this method to identify domains that refer to competitors and determine how they were used.
How to find all subdomains of a website and check internal links?
Enter the domain’s root URL-address into ReverseInternet and click the “Subdomains” tab to see the list of subdomains.
After that you shall activate “Scrape Similar” to collect the list of URLs using the following XPath request: //a[text()=’visit site’]/@href.
Export the results in .csv format, and then upload the CSV file to Screaming Frog using the “List” mode. When Spider completes its work you’ll be able to see status codes as well as any links on subdomain pages, anchor occurrences and even repeating page titles.
How to scan a commercial or any other large website.
Advise: While it would take a long time to complete the scanning of large websites in the past, Screaming Frog allows you to pause the procedure of high memory use. This valuable option allows you to save the results obtained prior to the moment the software might supposedly crash and to increase the memory space.
As of now, this option is turned on by default, however if you plan to scan a large website, it’s best to make sure, that the box “Pause On High Memory Usage” in the “Advanced” tab of the Spider configuration menu is checked.
How to scan a website, located at the old server.
Old servers sometimes may be incapable to process the set number of URL requests per second. To change the scanning speed, open the “Speed” section in the “Configuration” menu and select the maximum number of flows to be activated simultaneously in the pop-up window. In this window you can also select the maximum amount of URL-addresses requested per second.
Advise: If you see large number of server errors in the scan results, go to he “Advanced” tab in the Spider configuration menu and increase the Response Timeout and the 5xx Response retries. This will allow you to get better results.
How to scan a site that requires cookies.
While search robots don’t accept cookies, if you need to allow cookies when scanning a website, simply select “Allow Cookies” in the “Advanced” tab of the “Configuration” menu.
How to scan a site by using proxy or another user agent.
Select “Proxy” in the configuration menu and enter the relevant information. To scan using a different agent, select “User Agent” in the configuration menu and then select the search bot from the dropdown menu or enter its title.
How to scan sites that require authorization.
When Screaming Frog Spider accesses a page that requests for identification, a window pops up that requires you to enter login and password.
To avoid this procedure in the future, uncheck the “Request Authentication” box in the “Advanced” tab of the configuration menu.
What to do if it is necessary to receive information on external and internal website links (anchors, directives, interlinking, etc.).
After Spider completes the scan, use the “Advanced Export” menu to export CSV from the “All Links” database. This will provide you with all link locations and relevant anchor entrances, directives, etc.
To quickly calculate the number of links on each page, go to the “Internal” tab and sort information using the “Outlinks” option. Everything above the 100th item might require further attention.
How to find broken internal links to a page or a site.
As Spider completes the scan, filter the results of the “Internal” tab using the “Status Code” function. All 404, 301 and other status codes will be distinctively visible.
In the bottom part of the software window you can see information by clicking each URL in the scan results. By clicking “In Links” in the bottom window you’ll find a list of pages referring to the selected URL address, as well as anchor entrances and directives used on those pages. Use this feature to determine internal links that need to be renewed.
To export the list of pages with broken links or redirections in the CSV format use the “Redirection (3xx) In Links”, “Client Error (4xx) In Links”, or “Server Error (5xx) In Links” option in the “Advanced Export” menu.
How to reveal broken outgoing links on a page or a site (or all broken links altogether).
Same as before, first focus on scanning HTML-content, making sure to leave the “Check External Links” box checked.
After the scan is complete, select the “External” tab in the upper window and filter contents using the “Status Code” option to determine URLs with status codes different from 200. Click any URL-address in the scan results and then select the “In Links” tab in the bottom window, and you’ll find a list of pages that reference to the selected URL. Use this information to determine links that need to be renewed.
To receive the full list of all locations and anchor entrances of outgoing links, select the “All Out Links” in the “Advanced Export” menu, then sort the “Destination” column in the exported CSV to exclude your domain.
How to find redirecting links.
After completing the scan, select the “Response Codes” toolbox in the upper window and then sort the results using the “Redirection (3xx)” option. This will provide the list of all internal and outgoing links that will redirect. Using the “Status Code” filter, you can brake down the results by types. By clicking “In Links” in the bottom window, you’ll see all pages on which redirecting links are used.
If you export data from this tab directly, you’ll only see the data shown in the upper window (original URL, status code and the location that redirection leads to).
To export the full list of pages with redirecting links you shall select “Redirection (3xx) In Links” in the “Advanced Export” menu. This will produce the CSV file, which includes location of all redirecting links. To only display internal redirects, filter the CSV file contents with your domain’s data using the “Destination” column.
Advice: Use VLOOKUP over two exported files to compare “Source” and “Destination” columns with the location of the ending URL-address.
An example of the formula looks as follows:
=VLOOKUP([@Destination],’response_codes_redirection_(3xx).csv’!$A$3:$F$50,6,FALSE). Where “response_codes_redirection_(3xx).csv” - is the CSV file that contains redirecting URL-addresses, and “50” is a number of lines in that file.
How to identify pages with non-informative content (so-called “thin content”).
After Spider completes its work, go to “Internal” toolbox while setting up filtering by HTML, then scroll to the right to the “Word Count” column. Sort page content by word count to reveal those with the least text. You can drag the “Word Count” column to the left and place it next to the relevant URL-addresses to make data more indicative. Click the “Export” button on the “Internal” tab if you prefer to work with data in the CSV format.
Remember that Word Count allows to assess the volume of published text, but gives no indication whatsoever if this text is just names of products/services or a block optimized for keywords.
If you need to select the list of links to images from specific pages.
If you already scanned an entire website or a separate folder, just select the page in the upper window, then click “Image Info” in the bottom window to see images found on that page. Images will be listed in the “To” column.
Advice: Right-click on any item in the bottom window to copy or open the URL-address.
You can review images on a separate page by scanning that particular URL-address. Make sure to set the scan depth in the Spider configuration settings to “1”. After the page is scanned, go to the “Images” tab and you’ll find all images found by Spider.
Finally, if you prefer CSV, use the “All Image Alt Text” option of the “Advanced Export” menu to see the list of all images, their location and any alternative text related to them.
How to find images with no alternative text or images with long alt-text.
First of all, you need to make sure that the “Check Images” item in the Spider “Configuration” menu is selected. After the scan is complete, go to the “Images” tab and filter the content using “Missing Alt Text” or “Alt Text Over 100 Characters” options. By clicking the “Image Info” tab in the bottom window you’ll find all pages that have any images. Pages will be listed in the “From” column.
At the same time, you can save time y exporting “All Image Alt Text” or “Images Missing Alt Text” to the CSV format from the “Advanced Export” menu.
How to find every CSS-file on the website.
Before scanning, select “Check CSS” in the Spider configuration menu. After the process is completed, filter the analysis results in the “Internal” toolbox using the “CSS” option.
How to reveal all jQuery plugins used on the site and their locations.
At the same time, you can use the “Advanced Export” menu to export “All Links” to CSV and filter the “Destination” column in order to only display URL-addresses with “jQuery”.
Advice: All jQuery plugins are not the only bad things for SEO. If you see a site that uses jQuery, it makes sense to make sure that the content you plan to index is included in the source code of the page and is displayed on page loading rather than afterwards. If you’re not sure about it, read about the plugin on the internet to learn more about how it works.
How to determine, where flash is located on a website.
Before the scan, select “Check SWF” in the configuration menu. And after Spider completes its work, filter the results in the “Internal” toolbox by “Flash”.
How to find internal PDF documents on a site.
After the scan is completed, filter the Spider’s results using the “PDF” option on the “Internal” toolbox.
How to reveal content segmentation within a site or a group of pages.
If you want to find pages on a site that contain unusual content, setup a custom filter that detects HTML stamps that are untypical for that page. You shall do this before launching Spider.
How to find pages that contain social sharing buttons.
In order to do so, you shall set up the custom filter before launching Spider. To do so, go to the configuration menu and click “Custom”. Then enter any portion of the code from the page’s source code.
In the following example, the goal was to find pages that contain the “Like” button of the Facebook social network, therefore, the filter of “http://www.facebook.com/plugins/like.php” format was created.
How to find pages that use frame.
In order to do so, you shall create the relevant custom filter for the <iframe> tag.
How to find pages that contain embedded video or audio content.
In order to do so, setup the custom filter for the portion of the code for Youtube embedding or any other media player used on the site.
Meta Data and Directives
How to find pages with long, short or missing titles, meta description or meta keywords.
After the scan is completed, go to the “Page Titles” tab and filter its contents via “Over 70 Characters” to see page titles that are excessively long. The same can be done in “Meta Description” and “URL” toolboxes. Similar procedure can be used to determine pages with missing or short titles and meta data.
How to find pages with duplicate titles, meta description or meta keywords.
After the scan is completed, go to the “Page Titles” tab and filter the contents via “Duplicate” to see duplicate page titles. Same can be done in “Meta Description” and “URL” toolboxes.
How to find duplicate content and/or URL that shall be redirected/rewritten/canonicalized.
After Spider completes its work, go to the “URL” tab and filter the results using “Underscores”, “Uppercase” or “Non ASCII Characters”, thus revealing URLs that can possibly be rewritten to the more standard structure. Filter using the “Duplicate” tool to see the pages that have several URL versions. Use the “Dynamics” filter to reveal URL-addresses that include parameters.
Besides, if you go to the “Internal” tab via the “HTML” filter and scroll further right to the “Hash” column, you will see the unique sequence of letters and numbers on each page. If you click “Export”, you can use conditional formatting in Excel to detect repeating values in this column, which shows that the pages are identical and shall be reviewed.
How to determine pages that contain meta-directives.
After the scanning, go to the “Directives” toolbox. To see the type of a directive, simply scroll to the right and see, which columns are filled in. Alternatively, you can use the filter to find any of the following tags:
How to determine that the robots.txt file is not working properly.
By default, Screaming Frog will correspond to robots.txt. Program will use directives made specifically for the search agent as a priority. If they are missing, Spider will follow any directives for Googlebot. If there are no special directives for Googlebot, Spider will follow global directives accepted for all user agents. At that, Spider only selects one set of directives, not using those following.
If you need to block any portions of the site from Spider, use the normal robots.txt syntax with the Screaming Frog SEO Spider user agent. If you want to ignore robots.txt, simply select the relevant option in the software configuration menu.
How to find and check the Schema data separation or other microdata.
To find each page with the Schema data separation or other microdata you shall use custom filters. Click on “Custom” in the “Configuration” menu and enter the marker you’re looking for.
To find each page with the Schema data separation, simply add the following portion of the code in the custom filter: itemtype=http://schema.org.
To find a particular separation type, you’ll have to be more specific. For example, using the customs filter ‹span itemprop=”ratingValue”› allows you to find all pages with the Schema data separation for ranking purposes.
You can use 5 different filters for scanning. After that, you just need to click “OK” and allow the software to scan a site or a list of pages.
When Spider completes its work, select the “Custom” tab in the upper window to see all pages that contain the marker you were looking for. If you setup more than one custom filter, you can view them consequently, switching between filter pages in the scan results.
How to create a Sitemap in XML.
After Spider completes the scan of your website, click on “Advanced Export” and select “XML Sitemap”.
Save your sitemap and then open it in Excel. Select “Read Only” and open the file “As an XML table”. At that moment a message might pop up that some schemes cannot be integrated. Simply click “Yes”.
After the sitemap is displayed in the table form, you can easily change frequency, priority and other settings. Make sure that Sitemap only contains one priority (canonical) option of each URL with no parameters and other duplicate factors.
After making any changes, save the file in the XML mode.
How to find your existing Sitemap XML-file.
First of all, you need to create a copy of the sitemap on your computer. You can save any live sitemap by clicking on the URL and saving the file, or importing it to Excel.
Next go to the “Mode” section in the Screaming Frog menu and select “List”. After that, click “Select File” at the top of the page, select your file and start the scan. After Spider completes its work, you will be able to see any redirects, 404 errors, duplicate URL-addresses, etc. in the “Sitemap Dirt” section of the “Internal” tab.
General Recommendations on Troubleshooting
How to determine, why some sections of my site are not indexed or ranked.
Interested, why some pages are not indexed? First of all, make sure they’re not in robots.txt and not marked as “noindex”. Secondly, you need to make sure that spiders can access site pages to check internal links. After Spider completes the scan of your site, export the list of internal links as a CSV-file using the HTML filter in the “Internal” tab.
Open the CSV document and copy the list of URL addresses that are not indexed or ranked to the second sheet. Use VLOOKUP to see, if such “problem” URLs are in the scan results.
How to check, if the site transfer/redesign was successful.
You can use Screaming Frog to determine, if old URL-addresses were redirected. The “List” mode can help you with that, as it allows to check status codes. If old URLs return the 404 error, you’ll know for sure, which of them need redirecting.
How to find slow loading pages of a site.
After the scan process is completed, go to the “Response Codes” tab and sort the “Response Time” column using the “descending” principle to see the pages that my suffer from slow loading speed.
How to find malware or spam on a site.
First of all, you need to detect the traces left by malware or spam. Next, click “Custom” in the configuration menu and enter the title of the marker you’re looking for. You can analyze up to 5 such markers per scanning. Enter all that are necessary and click “OK” to analyze an entire site or a list of pages therein.
After the process is completed, co to the “Customs” tab located in the upper window to see all pages where the “traces” of fraudulent and virus programs you’ve specified were found. If you’ve entered more than one custom filter, results for each filter will be shown in a separate window and you can see them by switching between filters.
PPC and Analytics
How to check the list of all URLs used for context advertising simultaneously.
Save the list of addresses in the .txt or .csv format and change the mode settings from “Mode” to “List”. Next, select your file to upload and click “Start”. See the status code of each page in the “Internal” tab.
How to collect metadata from a number of pages.
Do you have a lot of URL-addresses, for which it is important to receive as many data as possible? Turn on the “Mode” mode, and then upload the list of addresses in the .txt or .csv format. After Spider completes its work, you’ll be able to see status codes, outgoing links, word count, and, naturally, metadata for each page in your list.
How to scrape the site for all pages that contain a particular marker.
First of all, you’ll need to figure out the marker itself and determine what is it exactly that you need. After that, click “Custom” in the “Configuration” menu and enter the title of the required marker. Remember that you can enter up to 5 different markers. Then click “OK” to launch the scanning process and filter site pages by the presence of specified “traces” in them.
The following example shows a situation, when one needs to find all pages that contain the phrase “Please Call” in sections related to product prices. In order to do so, HTML-code from the page’s source code was found and copied.
After the scan is completed, you shall go to the “Custom” section in the upper window to review the list of all pages that contain the specified marker. If more than one marker was entered, information on each of them will be provided in a separate window.
Advice: This method also works well in case you don’t have direct access to a site. If you need to receive data from a client’s website, it would be much easier and faster to ask him to retrieve the required data from the database and provide it to you.
How to find and delete ID sessions or other parameters from the scanned URL-addresses.
To identify URLs with ID sessions or other parameters, simply scan the site using default settings. After Spider completes its work, go to the “URL” tab and apply the “Dynamic” filter to see all URL-addresses that contain the required parameters.
To delete parameters from showing on scanned pages, select “URL Rewriting” in the configuration menu. Then click “Add” in the “Remove Parameters” toolbox to add parameters that you wish to delete from URLs and click “OK”. You will need to restart Spider in order to activate changes.
How to rewrite scanned URLs (for example, change .com to .co.uk or write all URLs sing the lower case).
In order to rewrite any address from those processed by Spider, select “URL Rewriting” from the configuration menu, and then click “Add” on the “Regex Replace” toolbox and add RegEx to what you need to change.
After you made all required changes, you can check them in the “Test” toolbox by entering URLs being tested into the “URL before rewriting” window. The “URL after rewriting” line will renew automatically in accordance with your parameters.
If you need to rewrite all URL-addresses in the lower case, simply select “Lowercase discovered URLs” on the “Options” toolbox.
Don’t forget to restart Spider after making any changes for them to become active.
How to find out, which pages of competitors’ sites have the highest value.
Overall, competitors will try to expand their link popularity and attract traffic to their most valuable pages by their internal interlinking. Any competitor that pays attention to SEO also establishes strong link between the corporate blog and the most important pages of the website.
Find the most valuable pages of the competitor’s site by scanning, then go to the “Internal” toolbox and sort results in the “Inlinks” column using the “descending” principle to see, which pages have the most internal links.
To see pages linked to the competitor’s corporate blog, uncheck the “Check links outside folder” box in the configuration menu of Spider and scan the folder/subdomain of the blog. Next, filter the results in the “External” panel using the search by the URL of the main domain. Scroll the page to the right end and sort the list in the “Inlinks” column to see pages that are linked most frequently.
Advice: For more convenient use of the table, drag and drop columns to the left or to the right.
How to find out, which anchors do competitors use for internal interlinking.
Select “All Anchor Text” in the “Advanced Export” menu to export the CSV that contains anchor entrances of the website and learn about their location and references.
How to find out, which meta keywords competitors use on their websites.
After Spider completes the scan, take a look at the “Meta Keywords” toolbox to review the list of meta keywords found on each separate page. Sort the “Meta Keyword 1” column alphabetically to make the data more representative.
How to analyze potential locations to place links.
Once the list of URL-addresses is collected, you can scan it in the “List” mode to collect as much data as possible about pages. After the scan is completed, check status codes in the “Response Codes” toolbox and study outgoing links, their types, anchor entrances and directives in the “Out Links” toolbox. This will give you an idea about what sites refer to these pages and how they do it.
To review the “Out Links” toolbox make sure that the URL you’re interested in is selected in the upper window.
You will probably want to use custom filters to determine if there are links already in those locations.
You can also export the full list of links by clicking the “All Out Links” option in the “Advanced Export Menu” toolbox. This will allow to not only receive the links that lead to external sites, but also show internal links by separate pages in your list.
How to find broken links for external advertisement.
Let’s say we have a website, from which you would like to receive links to your own site. Using Screaming Frog you can find broken links to site pages (or to the site as a whole) and then contact the owner of the website of your choice and offer him or her to replace broken links with links to your website, where possible.
How to check backlinks and review anchors.
Upload the list of your backlinks and launch Spider in the “List” mode. After that, export the entire list of external links by clicking “All Out Links” in the “Advanced Export Menu” menu. This will provide you with URLs and anchor text/Alt-text for all links on those pages. After that you can filter the “Destination” column in the CSV file to determine, if your site is interlinked and what anchor text/Alt-text it includes.
How to make sure that backlinks were deleted successfully.
In order to do so, you need to set up a custom filter, which contains the root domain URL, then upload your list of backlinks and launch Spider in the “List” mode. After the scan is completed, go to the “Custom” toolbox to review the list of pages that continue to link to you.
Advice: Remember, that by right-clicking on any URL-address in the upper window of the scan results, you can, among other things:
- Copy or open the URL-address.
- Launch the repeat scan of the address or delete it from the list.
- Export data on the URL or images available on that page, incoming and outgoing links.
- Check the page index in Google, Bing and Yahoo.
- Check the page backlinks in Majestic, OSE, Ahrefs and Blekko.
- Review the cashed version.
- Review old versions of pages.
- Open robots.txt for the domain, where the page is located.
- Launch a search for other domains with the same IP.
So, we have thoroughly reviewed all aspects of the program Screaming Frog. We hope that our detailed guide will help you make the audit of the site easier and at the same time effective enough, with will give an opportunity to save a lot of time.