New products and their terminology can be daunting, especially when it has to do with architectural things.
To take care of this issue, this article will serve as an introduction to SCNR's distributed terminology, features and the entities that provide them -- bare with me because they aren't that many.
However, since there are several different possible setups and ways to spawn scanner processes, this post is going to be on the large side, as we'll be taking the long route.
Don't worry though, this doesn't mean that SCNR is complex, quite the opposite; it allows you to avoid complexity and setting up more and more infrastructure unless it's absolutely necessary to do so; which is why we aren't going to go for the full monty right off the bat.
To demonstrate, we're going to explore SCNR's distributed features over the REST API and setup a Grid of Agents along with a Scheduler -- this sentence is going to make sense at the end of this article.
None of that is necessary to just perform a scan, however, it becomes necessary once too many scans need to be performed, and thus, the workload needs to be distributed across more than one machine.
What this basically means is that in the end we'll end up with a SCNR cloud (on a single machine, for demo purposes of course, this doesn't make much sense in real life) and easily manage it over REST.
This infrastructure will allow us to:
- Set in line multiple scans.
- Those scans will run ASAP -- i.e. when enough slots are/become available somewhere.
- Fire and forget -- the scans will be managed, we won't need to monitor them individually.
Since we're going to be using REST, I've opted to use Ruby to interact with the REST API, so be warned, and keep in mind that I'll be using the following HTTP helper methods:
There's not much to the REST server if there doesn't need to be.
Even though it can provide a powerful yet simple middleware interface to control a SCNR cloud, using it to perform multiple scans at the same time is easy as pie.
We're not going to bother with showing progress information, just run the scan and grab the report once it's done.
Basically, this is what's going on:
POST $OPTIONS /instances-- To spawn a scanner process and start the scan with the given options.
GET /instances/$INSTANCE_ID-- To get progress data, i.e. is the scan still active?
GET /instances/$INSTANCE_ID/scan/report.json-- Once the scan is not active (i.e. it's done) to retrieve and print out the JSON report.
DELETE /instances/$INSTANCE_ID-- To shutdown the scanner process.
And you can perform the above either to your heart's content or until the machine can't support any more scans, in which case the
POST call will return a
503 Service Unavailable status code -- nothing bad would have happened, it's just the system's way of telling you that it doesn't have any more resources to allocate to yet another scan.
Really, if all you need is a way to perform a scan via an easy to integrate API or even a few scans in parallel, you can stop reading here, the REST server has got you covered.
That is, from the same machine as the REST server of course, if you'd like to control scans over REST but have them originate from a different machine, you need to start an Agent and configure the REST server to use it.
Agents are server entities which supply Instances (i.e. scanner processes).
They can operate individually or be grouped together in a Grid configuration in order to provide load-balancing, but more on that later.
./bin/scnr_agent --port 1111
Now, let's configure our REST server to use it to spawn Instances rather than doing it itself.
Not much to it, just a
PUT $URL /agent/url REST call and from then on, Instances are going to be provided by the Agent rather than the REST server itself.
The rest of the code remains the same.
Agents can be grouped together to form a load-balancing Grid, by using the
--peer option upon startup.
It makes no difference which Grid member is assigned to the REST server (i.e. which Grid Agent receives a request to spawn an Instance), load-balancing will take place amongst them and you don't need to worry about it at all.
In essence, the Scheduler keeps a list of scan options and then, at first chance, spawns an Instance, configures, monitors it and then stores the report once the scan completes.
So if you have 1,000 scans to perform, you can post their options to the Scheduler and they will run and complete safely in parallel and ASAP.
The Scheduler can be configured to use an Agent, in which case Instances are going to be provided by it, and not the Scheduler itself, which is the default.
As usual with Agents, if they're part of a Grid transparent load-balancing is going to take place.
So, imagine this scenario:
- You have 10,000 scans to run.
- A Grid of 50 Agents.
- A Scheduler configured to use one of the Agents.
- You post all the scan options immediately to the Scheduler.
What will happen is:
- The Scheduler will request as many Instances from its Agent
as the Grid can bare.
- The Agent, being part of the Grid, will transparently load-balance its spawning of Instances across its peers.
- The Scheduler will monitor the running scans and once one completes the next scan is set to start.
So basically you can post as many scans to the Scheduler as you wish, and it'll sort everything out optimally, safely, in parallel and automatically.
In closing, that's about it for a starter course, easy peasy right? For more information be sure to read through the documentation.