Checking the Ark Compiler Recently Made Open-Source by Huawei

Picture 1


During the summer of 2019, Huawei gave a series of presentations announcing the Ark Compiler technology. The company claims that this open-source project will help developers make the Android system and third-party software much more fluent and responsive. By tradition, every new promising open-source project goes through PVS-Studio for us to evaluate the quality of its code.

Introduction


The Ark Compiler was first announced by Huawei at the launch of the new smartphone models P30 and P30 Pro. It is claimed that the Ark Compiler will improve the fluency of the Android system by 24% and response speed by 44%. Third-party Android applications will also gain a 60% speed-up after recompilation with the Ark Compiler. The open-source version of the project is called OpenArkCompiler; its source code is available on Gitee, a Chinese fork of GitHub.
To check this project, I used the PVS-Studio static code analyzer. This is a tool for detecting bugs and potential vulnerabilities in the source code of C, C++, C#, and Java programs.

The project’s size is 50 KLOC and it didn’t take long to check. A small project means modest results: the article will be focusing on 11 out of the total of 39 warnings (of High and Medium levels).

Defects found in the code


Warning 1

V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator. mir_parser.cpp 884

enum Opcode : uint8 {
  kOpUndef,
  ....
  OP_intrinsiccall,
  OP_intrinsiccallassigned,
  ....
  kOpLast,
};

bool MIRParser::ParseStmtIntrinsiccall(StmtNodePtr &stmt, bool isAssigned) {
  Opcode o = !isAssigned ? (....)
                         : (....);
  auto *intrnCallNode = mod.CurFuncCodeMemPool()->New(....);
  lexer.NextToken();
  if (o == !isAssigned ? OP_intrinsiccall : OP_intrinsiccallassigned) {
    intrnCallNode->SetIntrinsic(GetIntrinsicID(lexer.GetTokenKind()));
  } else {
    intrnCallNode->SetIntrinsic(static_cast(....));
  }
  ....
}


We are interested in the following part:

if (o == !isAssigned ? OP_intrinsiccall : OP_intrinsiccallassigned) {
  ....
}


The precedence of the '==' operator is higher than that of the ternary operator (?:). Therefore, the conditional expression is evaluated in the wrong order and is equivalent to the following code:

if ((o == !isAssigned) ? OP_intrinsiccall : OP_intrinsiccallassigned) {
  ....
}


Since the constants OP_intrinsiccall and OP_intrinsiccallassigned are non-null, the condition will be returning true all the time, which means the body of the else branch is unreachable code.

Warning 2

V570 The 'theDoubleVal' variable is assigned to itself. lexer.cpp 283

int64 theIntVal = 0;
float theFloatVal = 0.0;
double theDoubleVal = 0.0;

TokenKind MIRLexer
::GetFloatConst(uint32 valStart, uint32 startIdx, bool negative) {
  ....
  theIntVal = static_cast(theFloatVal);
  theDoubleVal = static_cast(theDoubleVal); // <=
  if (theFloatVal == -0) {
    theDoubleVal = -theDoubleVal;
  }
  ....
}


The theDoubleVal variable is assigned to itself without changing. The developer must have intended to store the result to theFloatVal instead because it is this variable that gets checked in the next line. If so, it should also be cast to float, not double. I think the fixed version should look like this:

theFloatVal = static_cast(theDoubleVal);
if (theFloatVal == -0) {
  theDoubleVal = -theDoubleVal;


or even like this if the programmer simply wrote the wrong variable in the condition:

if (theDoubleVal == -0) {
  theDoubleVal = -theDoubleVal;


I may still be wrong; perhaps this code should be fixed in some entirely different way. It does look obscure to an outside programmer like myself.

Warnings 3–5

V524 It is odd that the body of '-' function is fully equivalent to the body of '+' function. mpl_number.h 158

template 
inline Number operator+(const Number &lhs,
                                const Number &rhs) {
  return Number(lhs.get() + rhs.get());
}

template 
inline Number operator-(const Number &lhs,
                                const Number &rhs) {
  return Number(lhs.get() + rhs.get());
}


The header file mpl_number.h contains a lot of duplicate code with small modifications — and mistakes, of course. In this example, the addition and subtraction operators are implemented in the same way: the programmer forgot to change the operation sign in the body of the subtraction operator.

Other warnings of this type:

  • V524 It is odd that the body of '-' function is fully equivalent to the body of '+' function. mpl_number.h 233
  • V524 It is odd that the body of '-' function is fully equivalent to the body of '+' function. mpl_number.h 238


Warning 6

V560 A part of conditional expression is always false: ! firstImport. parser.cpp 2633

bool MIRParser::ParseMIRForImport() {
  ....
  if (paramIsIPA && firstImport) {
    BinaryMplt *binMplt = new BinaryMplt(mod);
    mod.SetBinMplt(binMplt);
    if (!(*binMplt).Import(...., paramIsIPA && !firstImport, paramIsComb)) {
      ....
    }
    ....
  }
  ....
}


The firstImport variable checked in the first conditional expression is always true. It means the following expression will always evaluate to false:

paramIsIPA && !firstImport


This code either contains a logic error or is overcomplicated and can be simplified by passing the false constant to the Import function.

Warning 7

V547 Expression 'idx >= 0' is always true. Unsigned type value is always >= 0. lexer.h 129

char GetCharAtWithLowerCheck(uint32 idx) const {
  return idx >= 0 ? line[idx] : 0;
}


This check of the index variable idx (>= 0) doesn’t make sense because the variable is unsigned. Perhaps it was meant to be compared with some other value as the threshold for indexing into the line array, or this meaningless check should be removed altogether.

Warning 8

V728 An excessive check can be simplified. The '||' operator is surrounded by opposite expressions 'c!= '\»'' and 'c == '\»''. lexer.cpp 400

TokenKind MIRLexer::GetTokenWithPrefixDoubleQuotation() {
  ....
  char c = GetCurrentCharWithUpperCheck();
  while ((c != 0) &&
         (c != '\"' || (c == '\"' && GetCharAtWithLowerCheck(....) == '\\'))) {
    ....
  }
  ....
}


The analyzer has spotted a code pattern that can be simplified. It looks similar to this form:

A || (!A && smth)


The ! A expression will always evaluate to true, so the original expression can be simplified as follows:

while ((c != 0) && (c != '\"' || (GetCharAtWithLowerCheck(....) == '\\'))) {
  ....
}


Warnings 9–10

V728 An excessive check can be simplified. The '(A && ! B) || (! A && B)' expression is equivalent to the 'bool (A) != bool (B)' expression. mir_nodes.cpp 1552

bool BinaryNode::Verify() const {
  ....
  if ((IsAddress(GetBOpnd(0)->GetPrimType()) &&
      !IsAddress(GetBOpnd(1)->GetPrimType()))
    ||
     (!IsAddress(GetBOpnd(0)->GetPrimType()) &&
       IsAddress(GetBOpnd(1)->GetPrimType()))) {
    ....
  }
  ....
}


This is another snippet that needs refactoring. To make it more readable, I split the code into several lines, while in its original form, the condition occupies two full lines, which made it much more difficult to figure out. The code can be rewritten in a simpler and clearer form:

if (IsAddress(GetBOpnd(0)->GetPrimType()) !=
    IsAddress(GetBOpnd(1)->GetPrimType()))
  ....
}


Another code fragment to be refactored in a similar way:

  • V728 An excessive check can be simplified. The '(A && B) || (! A && ! B)' expression is equivalent to the 'bool (A) == bool (B)' expression. bin_mpl_import.cpp 702


Warning 11

V1048 The 'floatSpec→floatStr' variable was assigned the same value. input.inl 1356

static void SecInitFloatSpec(SecFloatSpec *floatSpec)
{
  floatSpec->floatStr = floatSpec->buffer;
  floatSpec->allocatedFloatStr = NULL;
  floatSpec->floatStrSize = sizeof(floatSpec->buffer) /
                            sizeof(floatSpec->buffer[0]);
  floatSpec->floatStr = floatSpec->buffer;
  floatSpec->floatStrUsedLen = 0;
}


The analyzer has detected two identical initializations of the variable floatSpec→floatStr. I believe the second duplicate line can be removed.

Conclusion


Just a few days ago, we checked another project by Huawei, Huawei Cloud DIS SDK. The company is currently making their projects open-source, which is good news for the developer community. Such projects as the Ark Compiler or Harmony OS are very young and haven’t become popular yet, so investing in the quality control of the code at this stage should be very profitable since it can help avoid potential vulnerabilities and customer criticism.

References


  1. Checking LLVM, 2011
  2. Checking LLVM, 2012
  3. Checking GCC, 2016
  4. Checking LLVM, 2016
  5. Checking PascalABC.NET, 2017
  6. Checking Roslyn (.NET Compiler Platform), 2019
  7. Checking LLVM, 2019

© Habrahabr.ru