If you’ve reached this page, it’s most likely that your site has been indexed by NetSeer’s spider. We’re in the business of understanding human intent, and our spider helps us better ascertain how content, language and ideas all fit together. If our crawl was unwanted, we apologize for the intrusion and have provided instructions on how to limit our spider’s access to your website.
We’re the originator of Concept targeting, which provides ad solutions for advertisers and publishers driven by the company’s patented ConceptGraph™ intent engine.
NetSeer’s spider crawls the Web by starting from a few well-known entry points and recursively following links. By indexing the visited pages, we are able to better understand the connections and true meaning of content on the Web. NetSeer follows all W3 guidelines regarding web crawling. This means that you can prevent our spider from indexing pages you want to remain private, or from following links within your website. The below instructions provide guidelines for configuring your website to prevent spider access. If you have any further questions about the NetSeer spider, please feel free to contact us.
Customizing Spider Access to Your Website
We will not crawl anything you would like to remain private. By using the Standard for Robot Exclusion (SRE) you can let NetSeer’s spider, as well as other spiders, know not to crawl your site. There are two techniques that can be used to customize or otherwise limit spider access to your website, a robots.txt file, and in-page META instructions.
Using robots.txt : You can place a file named “robots.txt” at the top level of your website, e.g. http://www.mywebsite.com/robots.txt .
This file tells crawlers which directories can or cannot be crawled. It is important to note that this filename is case-sensitive. We have configured our spider to be a bit more forgiving, but typically a spider will only respect this file if it is correctly named and formatted. The crawler looks for a file called “robots.txt”.
Robots.txt is a file website administrators can place at the top level of a site to direct the behavior of web crawling robots. NetSeer’s crawler will always pick up a copy of the robots.txt file prior to its crawl of the Web.
To exclude NetSeer’s crawler, the robots.txt file should look like this: User-agent: netseer Disallow: /
To exclude just one directory (and its subdirectories), say, the /images/ directory, the file should look like this: User-agent: netseer Disallow: /images/
Visit http://www.robotstxt.org/wc/faq.html for more details on how to instruct robots when they visit your site.
Using a META tag: If you cannot create a robots.txt, you can also limit spider behavior through the use of META tags that direct visiting spiders. Like any META tag, it should be placed in the HEAD section of an HTML page. You should put it in every page on your site, as a robot can encounter a deep link to any page on your site. The “NAME” attribute must be “ROBOTS”. Valid values for the “CONTENT” attribute are: “INDEX”, “NOINDEX”, “FOLLOW”, “NOFOLLOW”.
Multiple comma-separated values are allowed, but obviously only some combinations make sense. If there is no robots <META> tag, the default is “INDEX,FOLLOW.”
<META NAME=”ROBOTS” CONTENT=”NOINDEX, FOLLOW”> (Don’t index this page, but follow links)
<META NAME=”ROBOTS” CONTENT=”INDEX, NOFOLLOW”> (Index this page, but don’t follow links)
<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”> (Don’t index, don’t follow links)