Java Bytecode Reverse Engineering

Abstract

 
This article is especially designed to crack Java executables, by disassembling their corresponding bytes code.  Disassembling of Java bytecodes is the act of transforming Java bytecodes into Java source code.  Disassembling is an inherent issue in the software industry, causing revenue loss due to software piracy.  Security engineers tend to resist disassembling techniques, including software watermarking and code obfuscation, in the context of Java byte code disassembling.  A large portion of this paper is dedicated to tactics that are commonly considered to be Reverse Engineering.  The methods presented here, are intended for professional software developers and each technique is based on custom created applications.  We are not encouraging any kind of malicious hacking approach by presenting this article.  In fact, the contents of this paper assist to pinpoint the vulnerability, in the source code, and learn the various methods gor developers to shield their intellectual property, from reversing.  We shall come across with the process of disassembling in terms of obtaining sensitive information from source code and cracking a Java executable, without having the original source code.
 

Prerequisite

 
I presume that the reader has a thorough understanding of programming, debugging, and compiling in Java on various platforms, such as Linux, Windows and of course, a JVM inner working knowledge.  Apart from that, the subsequent tools are required to manipulate byte code reverse engineering.
  • JDK Toolkit [Javac, javap]
  • Eclipse
  • JVM
  • JAD

Java Byte code

 
Engineers usually, construct software in a high-level language like Java that is comprehensible to them, but that in fact, cannot be executed by the machine directly.  Such textual form of a computer program, known as source code, is converted into a form that the computer can directly execute.  Java source code is compiled into an intermediate language known as Java bytecode, which is not directly executed by the CPU, but executed by a Java Virtual Machine.  Compilation typically is the act of transforming a high-level language, into a low-level language, such as machine code or bytecode.  We do not need to understand Java byte code, rather doing so can assist debugging and can improve performance and memory consumption.
  
JVM 
 
The JVM is essentially a simple stack-based machine that can be separated into a couple of segments, such as stack, heap, registers, method area and native method stacks. An advantage of the virtual machine architecture is portability.  Any machine that implements the Java Virtual Machine specification is able to execute Java bytecode, in a manner of “Write once, run anywhere".  Java bytecode is not strictly linked to the Java language and there are many compilers and other tools available, that produce Java bytecode, such as the Eclipse IDE, Netbeans and Jasmin bytecode assembler.  Another advantage of the Java Virtual Machine is the runtime type-safety of programs.  The Java Virtual Machine defines the required behavior of a Java Virtual Machine,
but does not specify any implementation details. Therefore the implementation of the Java Virtual Machine specification can be designed in various ways for diverse platforms as long as it adheres to the specification.
 

Sample Cracked Application

 
The subsequent Java console application “LoginTest” is developed to reflect Java byte code disassembling. This application typically tests the valid users by passing them using a simple login user name and password mechanism. We have this application from other resources as an unregistered user and obviously, we don't possess the source code of this application. As a result, we are do not have the valid user name and password that is only provided to the registered user and could not log in eventually.
 
java login test 
 
Without having the source code of the application or log in credential sets, we still can manage to login into this mechanism, by disassembling its byte code where we can expose sensitive information related to the user login.
 

Disassemble Bytecode

 
Disassembling is the reverse approach due to the standard and well-documented structure of bytecode that is an act of transforming a low-level language into a high-level language. It basically generates the source code from Java bytecode. We typically run a disassembler to obtain the source code for a given bytecode just as a compiler is run to yield bytecode from the source code. Disassembling is used to determine the implementation logic in the absence of the relevant documentation and the source code, which is why vendors explicitly prohibit disassembling and reverse engineering in the license agreement. Here are some of the reasons to decompile:
  • Fixing critical bugs in the software for which no source code exists.
  • Troubleshooting a software or jar that does not have proper documentation.
  • Recovering the source code that was accidentally lost.
  • Learning the implementation of a mechanism.
  • Learning to protect your code from reverse engineering.
The process of disassembling Java byte code is quite simple, not as complex as a native C/C++ binary. The first step is to compile the Java source code file that has *.java extension using the javac utility that produces a *.class file from the original source code in which byte code typically resides. Finally, by using java that is a utility provided in the JDK toolkit, we can disassemble the byte code from the corresponding *.class file. The java utility stores its output in a *.bc file.
 
javac 
 
Opening a *.class file does not mean that we access the entire implementation logic of a mechanism. If we try to open the generated byte code file using simple Notepad or any editor after compiling the Java source code file using the javac utility we surprisingly find some bizarre or strange data in the class file that are totally uncomprehendable. Here, the following figure displays the .class file's data as:
 
cmd prompt 
 
So, the idea of opening the class file directly, isn't successful at all, hence we shall encounter the WinHex editor to disassemble the byte code, that produces implementation logic in hexadecimal bytes along with the string that is manipulated in the application. Although we can reverse engineer or reveal sensitive information of a Java application using the WinHex editor, this operation is sophisticated because unless we don't have knowledge of the hex byte reference to the corresponding instruction in the source code, we can't obtain much of the information.
 
Bytecode 
 

Reversing Bytecode

 
It is relatively easy to disassemble byte code of a Java application rather than another binary. The java built-in utility that ships with the JDK toolkit play a significant role in disassembling Java byte code as well as assisting in revealing sensitive information. It typically accepts a *.class file as an argument as in the following:
 
Drive:\> Javap LoginTest
 
Once this command is issued, it shows the real source code behind the class file, but remember one thing, it does display only the methods signature used in the source code as in the following:
  1. Compiled from“ LoginTest.java”  
  2. public class LoginTest {  
  3.  public LoginTest();  
  4.  public static void main(java.lang.String[]);  
  5.  static boolean verify(java.lang.String, char[]);  
  6. }   
The entire source code of Java executable, even of the contains methods related opcodes, would be showcased by the java –c switch, as in the following:
 
Drive:\> Javap –c LoginTest
 
The previous command dumps the entire byte code of the program in the form of special opcode instructions. The meaning of each instruction in the context of this program will be explained in a later section of this paper. I highlighted the important section from where we can obtain critical information.
 
code 
 
From line 62, we can easily conclude that the login mechanism is implemented using a method called verify that typically checks either the user entered cored username password or not. If the user entered the correct password then a Login success message flashes, otherwise:
 
login test 
 
But still, we are unable to grab the username and password related information. Hence, if we analyze the verify methods instruction, we can easily determine that the username and password are hard-coded in the code itself, highlighted in the colored box as in the following.
 
colored box
 
We finally, come to the conclusion that this program accepts Ajay as username and test as a password that is specified in the ldc instruction.
 
Hence, launch the application once again and entered the obtained credentials as previously described. Bingo!!!! We have successfully subverted the login authentication mechanism without even having the source code as in the following:
 
java login test page 
 

Byte Code Instruction Specification

 
Like assembly programming, Java machine code representation is done via bytecode opcodes that form instructions that the JVM executes on any platform. Java byte codes typically offer 256 diverse mnemonics and each is one byte in length. Java byte codes instructions fall into these major categories: 
  • Load and store
  • Method invocation and return
  • Control transfer
  • Arithmetical operation
  • Type conversion
  • Object manipulation
  • Operand stack management
We shall only discuss the opcode instructions that are used in the previous Java binary. The following table illustrates the usage meaning as well as corresponding hex value as in the following:
 

Java Opcodes Meaning Hex value

 
java Opcodes 
 

In-Brief

 
This paper explained the mechanism of disassembling Java byte code in order to reveal sensitive information when the source of the Java binary is unavailable. We have come to an understanding of how to implement such reverse engineering using JDK utilities. This article also unfolds the importance of byte code disassembling and JVM internals in the context reverse byte code as well as explain the meaning of essential byte opcode in details. Finally, we have seen subverting login authentication on a live Java console application by applying to disassemble tactics. In the forthcoming paper, we shall explain, how to patch Java byte code in the context of revere engineering.