[Из песочницы] Writing a wasm loader for Ghidra. Part 1: Problem statement and setting up environment

This week, evil forces (multinationals and government) all of a sudden made a gift to humanity. Microsoft had opened source code of windows calculator, while the NSA (National Security Agency) opened sources of their software reverse engineering framework. This event divided security community into two groups. The first one start doing static analysis and fuzzing of the windows calculator. Second one start playing with the new toy from the NSA. According to the feedback about ghidra, it«s really amazing tool, able to compete with existing solutions, such as IDA Pro, R2 and JEB. The tool is called Ghidra and professional resources are full of impressions from security researcher. Actually, they all have a reason: not every day government organisations provide access to their internal tools. Myself as a professional reverse engineer and malware analyst couldn«t pass by as well. I decided to spend a weekend to get a first impression of the tool. As it«s a large framework and I’ve chosen quite complicated task, I«ll break the article to several parts.
I prefer three methods of tools studying.

The first one is to write own plugins and add-ons. Using API, provided by developers, I can understand architecture and extensibility. After this I know how difficult it would be to change functionality of the tools and adapt it to my purposes.

The second one is solving tasks from CTFs — reverse engineering challenges. Usually task on those challenges are isolated and demonstrate certain problem or an approach. Having experience in CTFs and set of solved tasks, it«s easy to find task, suitable to test required functionality.

And last, but not least is sharing experience. It«s much easier to pay attention to details, keeping in mind that you will have to present to others.

In this particular case I decided to combine the methods and write articles on how I develop Ghidra add-on to solve CTF task.

Let«s us set ourselves the task. Last year security company FireEye hosted CTF contest, named flare-on. During this contest researchers had to solve twelve tasks, related to reverse-engineering. One of the tasks was to research the web-application, built with WebAssembly. It«s relatively new executable format, and as far as I know, there«s no perfect tools to deal with it. During the challenge, I tried several various tools trying to defeat it. Those were simple scripts from github and known decompilers, such as IDA pro and JEB. Surprisingly, I«ve stopped on chrome, which provides pretty good disassembler and debugger for WebAssembly. Let«s check how we«ll be able to solve the challenge with the ghidra. I«ll try to describe the study as fully as possible and give all the possible information to reproduce it. Maybe, as a person, who doesn«t have much experience with the instrument, I might go into some unnecessary details, but it is how it is.

Ghira installation can be downloaded from https://ghidra-sre.org/. Since it«s written in Java, there«s almost no special requirements to installation, it doesn’t required any special efforts to install. Everything you need is simply to unpack the archive and run the application. The only thing required is to update JDK and JRE to version 11.

The task I«m going to use for study can be download from flareon5 challenge site. There«s» file 05_web2point0.7z: archive encrypted with a scary word infected. There are three files in the archive: index.html, main.js and test.wasm. Let«s open the file index.html in a browser and check the result out:

5zo2fdf7jjndo2tmhesju4go7io.png

Well, that«s what I«ll work with. Let«s start with html study, especially since it«s the easiest part of the challenge. The html code doesn«t contain anything except loading of main.js script.


    
        
            
            
        










 
    


The script doesn«t do anything complicated as well, despite it looks a bit more verbose. It just loads file test.wasm and uses it to create WebAssembly instance. Then it reads parameter «q» from url and passes it to method match, exported by the instance. If string in the parameter is incorrect, script shows the image we«ve seen above, in terms of FireEye developers called «Pile of poo».

    let b = new Uint8Array(new TextEncoder().encode(getParameterByName("q")));
    let pa = wasm_alloc(instance, 0x200);
    wasm_write(instance, pa, a);
    let pb = wasm_alloc(instance, 0x200);
    wasm_write(instance, pb, b);
    if (instance.exports.Match(pa, a.byteLength, pb, b.byteLength) == 1) {
        // PARTY POPPER
        document.getElementById("container").innerText = "🎉";
    } else {
        // PILE OF POO
        document.getElementById("container").innerText = "ðŸ’";
    }


Solution of the task is to find the value of the parameter q that function «match» will return 1 with it. It«s need to disassembly the file test.wasm and understand algorithm of the function math to accomplish the task.

Let«s create new ghidra project (File→New Project), and call it «wasm»/

yt1j2sgpytbsoyhsschjdit4v7m.png

Then add to project the file test.wasm (File→Import file) and see what ghidra can do with it

upwqjqan-5cooznznqw5pcks-pw.png

Well, it can do nothing. It doesn«t recognize wasm format and can«t disassembly anything, therefore it«s absolutely powerless to deal with this task. Finally we«ve come to the subject of the article. Let’s write a module, which is able to load wasm file, analyze it and disassembly its code.

It«s a bit slow process and one article doesn’t look enough for it (I guess, even one weekend isn«t enough). By the end of this article I hope to setup development environment and build minimal module, which will be able to recognize format of was file and suggest to use right disassembler for it.

First of all I«ve studied all the available documentation. Actually, there«s only one suitable to the task document: slides GhidraAdvancedDevelopment, showing the process of add-ons development. I«m going to follow the document, describing my every step.

Unfortunately, add-ons development requires usage of eclipse. All my experience with eclipse is the development of two gdx games for Android in 2012. It had been two weeks full of pain and suffering, after which I erased eclipse from my mind. Hope after 7 years of development it«s better than it used to be.

Let«s download and install eclipse from the official site.

Then, install extension for the ghidra development:
Goto eclipse Help→Install New Software menu
Click on Add and choose GhidraDev.zip from /Extensions/Eclipse/GhidraDev/, do install and restart of the extension. The extension, adds templates to the new project menu, allows to debug modules from eclipse and compile module to the distribution package.

As it follows from the developers docs following steps must be done to add module for processing new binary format:

  • Create classes, describing data structures
  • Develop loader. Loader should be inherited from the class AbstractLibrarySupportLoader. It reads all the necessary data from the file, checks data integrity and converts the binary data to internal representation, preparing it to analysis
  • Develop analyzer. Analyzer is inherited from the class AbstractAnalyzer. It takes the data data structures prepared by the loader and annotates them (I«m not really sure what does it mean, but I hope to understand during the development)
  • Add processor. Ghidra has an abstraction, called processor which describes instructions set, memory layout and other architectural features. It uses own internal declarative language to describe processor. I«m going to cover this topic, writing the disassembler.


Now, when we know theory, it«s time to create the module project. Thanks to the GhidraDev eclipse extension, we have the module template right in the File→New project menu.

irz377yf_rcsginnfb-e8-ee-ga.png

Wizard ask what components are required. As it was described before, we would need two of them: loader and analyzer.

vfccj1djknkxefjuc1bg9pysycs.png

Wizard creates project skeleton with all the necessary parts: blank analyzer in the file WasmAnalyzer.java, blank loader in file WasmLoader.java and language skeleton in directory /data/languages.

bu3quwuupe-0heogbta81hrub7c.png

Let«s start with the loader. As it was mentioned, it should be inherited from the class AbstractLibrarySupportLoader. It«s need to overload couple of the parent class methods. The first and the simpliest method is getName, it just returns the name of the loader

    public String getName() {
         return "WebAssembly";
    }


Second method is findSupportedLoadSpecs. It«s called by tool during the import of the file and should verify whether loader is able to process the file. If it«s able method returns object of the LoadSpec class, which tells which object is used to load file and which disassembly is required to analyzys.

First is the format verification. Let«s load the file into hex editor 010 and study it«s structure.

kbtzc9sajnotf-ukum29stsj8dm.png

The first eight bytes is the signature »\0asm» and version. Loader will check them before processing the file. Let«s create class WasmHeader, implementing interface StructConverter, which is base interface to describe structured data.

Constructor of the WasmHeader receives the object BinaryReader — abstraction, used to read data from binary source being analyzed. Constructor uses it to read header of the input file

   private byte[] magic;
   private byte [] version;
   public WasmHeader(BinaryReader reader) throws IOException {
        magic = reader.readNextByteArray(WASM_MAGIC_BASE.length());
        version = reader.readNextByteArray(WASM_VERSION_LENGTH);
   }


Loader verifies the signature and if it«s match, search for appropriate processor. It calls method query of the class QueryOpinionService, and passes it the name of the loader («Webassembly»). OpinionService is looking for processor associated with this loader and returns it back.

List queries =  QueryOpinionService.query(getName(), MACHINE, null);


Sure thing it returns nothing, because ghidra doesn«t know what is WebAssembly. It«s to tell her. As I told before, wizard created the language skeleton in directory data/languages.

ckhq0d4gena4dbalhjufpwv9kbs.png

At the current stage there are two files which might be interesting: Webassembly.opinion and Wbassembly.ldefs. File .opinon sets the correspondence between loader and processor.


    
        
    



It contains simple xml with few attributes. It«s need to set name of the loader in to attribute «loader» and name of the processor into attribute «processor», both are «Webassembly». On this step I’ll fill other parameters with the random values. As soon as I know more about Webassembly processor architectoure, I’ll change them to correct values.

File .ldefs describes features of the processor, which should execute code from the file.


   
    Webassembly Language Module
    
  



Attribute «processor» should be the same as the attribute processor from file .opinion. Let«s leave other fields untouched. But remember next time that it«s possible to set registry bittness (attribute «size»), file describing architecture of the processor «processorspec» and file, containing description of the code in special declarative language «slafile». It«ll come handy to work on disassembly.

Now, it«s time to get back to the loader and return specification of the loader.

Everything«s ready for the test run. Plugin for GhidraDev has added run option »Run→Run As→Ghidra» to eclipse:

j5z-ft7xxiy1equy99ahzrt6d7a.png

It runs ghidra in debug mode and deploy there the module, giving a great opportunity to work with the tool and in the same time use debugger to check how the module works. But at this simple stage there is no reason to use a debugger. As before, I«ll create new project, import file and see whether my efforts paid off. Unlike the last time, file is recognized as WebAssembly, and has corresponding language. That means everything works, and my reader is able to recognize format and choose corresponding processor, created by my (which is empty at the moment)

zsnpvdvzqx5ecg7xdlz86dzres0.png

In the next article I«ll write real loader, which not only recognizes, but also describes the structure of the wasm file. I think at this stage, after environment is set up, it will be easy to do.

Code of article is available at github repository.

© Habrahabr.ru