Cuckoo Sandbox Architecture
Cuckoo Sandbox (GitHub) is a widely used advanced automated malware analysis tool. It consists of multiple, modular components which work together to collect and present the behavioral data of the malware to the user.
For example, Cuckoo Sandbox can analyze many different malicious files (executables, office documents, pdf files, emails, etc) as well as malicious websites under Windows, Linux, Mac OS X, and Android virtualized environments.
Due to Cuckoo Sandbox's open source nature and extensive modular design, you can customize any aspect of the analysis environment, analysis results processing, and reporting stage. Cuckoo provides you with all the requirements to easily integrate the sandbox into your existing framework and backend in the way you want and format you want.
This blog post is the first in a series of blog posts on the ins and outs of Cuckoo Sandbox. We will kick off the series by taking a closer look at the flow of an analysis and the responsibilities of all Cuckoo components.
The flow of an analysis
After a file or URL submission, a new entry will be made in the database and a task ID will be generated. The entry contains information on what the target (the object to be analyzed) is as well as the configured and specified analysis preferences for this new task.
The Scheduler constantly checks if there are any virtual machines (VMs) available. If there are, it will search for any tasks that are pending to be analysed. A task will then be selected while taking into account the task priority. The selected task is then handed over to the analysis manager.
The analysis manager will start by selecting one of the available VMs to use for this task, after which it will start the analysis. As one of the first steps, the result server is informed about the new task, so that it can keep track of all collected data that is uploaded to it.
Before starting the VM, all the auxiliary modules (supporting modules) are started. Once running, the analyzer, monitor, configuration and the target are uploaded to the agent inside the machine by the guest manager, which is responsible for communicating with the agent. The analyzer is started by the agent and in turn, starts/opens the target and injects it with the monitor.
While the target is being executed, the monitor and analyzer communicate collected behavioral information back to the result server. On the host, the analysis manager now waits until the guest manager determines a target is done running by checking if the analyzer has stopped, or a critical timeout has been reached.
When the target has finished running, the analysis manager will stop the machine and all auxiliary modules. When these are terminated, the processing modules come into effect in order to process the collected behavioral information and return usable results. These results are then run against all the available signatures. The final step is running all reporting modules on the results. The reporting modules make sure the results are stored in a useful format for the end-user, such as JSON and MongoDB for the web interface.
The Scheduler is one of the components that is continuously running. It is responsible for initializing the configured machinery module (VirtualBox, VMware, etc.) and starting new tasks that are pending if enough resources, such as disk space, are available. It does so while making sure that the configured maximum number of VMs started/running is not exceeded. The Scheduler constantly checks if there are any VMs available. If this is the case, it will look for pending tasks. When it is ready to start a selected task, the task information is handed over to the analysis manager.
The Analysis Manager
This component is started by the Scheduler and is responsible for the complete analysis flow of a task. The Analysis Manager decides when a machine is started or stopped, and if/when other modules should be started. As soon as the Analysis Manager is started it will try to find a machine that matches with the new task. For example, a task might require its target to run on a specific environment or machine. Before starting the machine itself, the Analysis Manager will start the required auxiliary modules. At this point, the analysis flow is handled by the Guest Manager until the analyzer stops or it hits a critical timeout.
Auxiliary Modules are modules that need to be started before a machine can be started. These modules can be responsible for all sorts of tasks that have to be completed either before the machine runs or during. For example, mitmdump and sniffer could be seen as Auxiliary Modules. Sniffer is used to dump all network traffic that is generated inside a running machine.
The Machinery Modules are responsible for interacting with the hypervisor or physical machine. These modules start, stop or restore the VM to a clean state. One of these modules (VirtualBox by default), is initialized by the Scheduler and used to manage all the configured VMs while Cuckoo is running.
The Guest Manager
The Guest Manager is responsible for communicating with the agent. It checks whether the machine has started yet, after which it uploads everything and starts the analyzer. Now that the analyzer has started, the guest manager will wait while it is constantly asking the agent if the analyzer has reported it is done yet. If a critical timeout is reached, the guest manager will force the analyzer to stop.
The Cuckoo Agent
The Cuckoo Agent is a simple HTTP server that allows for starting processes and uploading files. It resides inside the VM and should be started as soon as the operating system starts. The Guest Manager uses the Agent to upload and start the Analyzer.
The Analyzer is the component that is executed inside the guest VM. It contains all the logic and the supporting modules required for the analysis flow that has to be performed inside the machine. This component differs per platform, as the flow and required modules can also differ per platform. The Guest Manager will select the Analyzer according to the specified platform for the used machine. This is specified in the configuration file.
Once the Analyzer has been started by the Agent it starts looking for the configuration it has received. This configuration contains information about the target, the URL or a path to a file on the VM that should be executed. A target is executed using an analysis package. This is an instruction on how to open the target. For example, whether a URL should be opened in Internet Explorer or Firefox, or how a certain file, such as a .docx or .jar should be opened.
The analysis package can be provided while submitting a target. If it was not provided, the analyzer will try to find the best analysis package by using information about the target contained in the configuration.
Before the target is started, the guest auxiliary modules are started. These are supporting modules that can contain any logic, just like the modules on the host. Examples of guest auxiliary modules are the human module and the screenshot module. These are responsible for simulating human behavior and capturing screenshots.
When a target is started on Windows, it is injected with the Cuckoo monitor DLL. This DLL will try to log any behavior it sees by hooking functions, following processes, etc. All the collected behavioral data is sent to the result server located on the host.
The analyzer will run as long as any of the target processes still exist, or the analysis timeout has not been reached.
The Result Server
Before the machine is started, the Analysis Manager registers its IP address and the task ID to the Result Server. It is responsible for handling incoming data streams and storing these streams in the correct format and correct task directory.
Processing Modules, Signatures and Reporting Modules
Sequentially, after the machine has stopped, the collected behavioral data needs to be processed. Processing in Cuckoo firstly means translating all intercepted behavioral data into data that can be used by the signatures. Secondly, it means making sure that the data can be displayed to the end-user in the form of a report. In order to do this, all the processing modules are run. Examples of processing modules are: translating the collected system calls to a readable/searchable format; performing static analysis; extracting network streams; and finally, searching the process memory dumps. All executed modules contribute to a structured set of results that can be used by the Cuckoo signatures and reporting modules.
The Cuckoo signatures are run after the processing is completed. If any of the signatures have a match, then the signatures and the indicators of compromise will be added to the set of results. As a final step in the analysis, all reporting modules are run. These modules store the results in various formats with various goals. Two common examples are storing a JSON file and reporting to MongoDB, which is used by the Cuckoo web interface to display results.
Finally, when reporting has finished, the task is marked as reported and the results are now ready to be interpreted.
Looking for Cuckoo Sandbox support? Check Hatching's solutions.
Originally published here: https://hatching.io/blog/cuckoo-sandbox-architecture