Software Development optimize

How obfuscation helps protect Java from reverse engineering


Few things are more frustrating to programmers than running across a bug you can't solve without access to source code you don't have. Whether you're patching in code from an online open-source library or you're making calls to common operating system routines, you likely spend time each week crunching code that you didn't write and for which you may not have the source.

It's easy to reverse engineer Java class files because Java bytecode contains a lot of the same information as its original source code. In addition, Java programs have a good reputation as being "write once, run everywhere." This flexibility has a number of potential advantages in a distributed environment. While not unique to the Java language, code decompilation has never been deployed so publicly or ubiquitously as it is among Java developers. The flip side of decompilation is obfuscation.

What is obfuscation?

Given the ease with which decompilers extract source code from compiled code, protecting your code and the valuable secrets therein isn't easy. As Java decompilers have come into regular use so have Java obfuscators, which effectively put a smoke screen around your code.

Code obfuscation is currently one of the best methods for protecting Java code from reverse engineering. Obfuscation renders software unintelligible but still functionally equivalent to the original code. It also makes programs more difficult to understand, so that it is more resistant to reverse engineering.

In Figure A, a set of class files, P, becomes another set of class files, P', through an obfuscator. The result is that the code of P is not equal to the code of P', and P.code is more difficult to understand than P'.code but both function the same way.

For example, let's take simple Java code from the original source (src1.java.txt):

class OriginalHello {

   public OriginalHello() {

       int number=1;

   }

   public String getHello(String helloname){

       return helloname;

}

After obfuscating this code by the simplest obfuscator (such as KlassMaster), all names in this class will be changed to scrambled, and line numbers will be removed. This is obfuscated code (dst1.java.txt):

class a {

   public static boolean a;

   public a() {

      int a=1;

   }

   public String a(String b){

       return b;

}

From above, you can see that Hello.class has been changed to a class by obfuscator KlassMaster and their method, getHello(java.lang.String), is altered to a(java.lang.String). The method name, a(), is more difficult to understand than getHello(). When you compare the obfuscated bytecode with the original bytecode, you can also see that the line numbers have been removed from the obfuscated bytecode. This gives less information to reverse engineers.

This very simple example about code obfuscation just scrambles identifiers and removes the line numbers that are generated by compilers. Modern commercial obfuscators are able to scramble really fast and are tough to decrypt the code; however, there is still a lot of research being done in this area.

Obfuscation techniques

Besides literals replacement and line number removals, there is a set of tricks that various obfuscators use. One popular way to obscure source is to take the meaningless string trick to the next level by replacing a symbol from the class file with an illegal string. The replacement might be a keyword like private or, even worse, a completely meaningless symbol such as ***. Some virtual machines, especially in browsers, don't take kindly to such antics. Technically, a variable having a symbol as a name such as = is contrary to the Java specification; some virtual machines will overlook it.

Another technique some obfuscators use is usually targeted to specific decompilers like Mocha and JODE. A bad instruction is injected into the code; it doesn't make a difference in running the code, but it crashes the decompiler.

As an example of such bad instruction, let's take the original code (decompiled):

Method void main(java.lang.String[])

     0 new #4

      3 invokespecial #10

      6 return

and the code after obfuscation (but keeping the same names for simplicity):

Method void main(java.lang.String[])

      0 new #4

      3 invokespecial #10

      6 return

      7 pop

Note that the routine now has a pop instruction after the return. Obviously, a function can't do something after it's returned -- that's the trick. By placing an instruction after a return statement, it ensures that it will never be executed. The code here is essentially impossible to decompile; it doesn't make any sense because it doesn't correspond to any possible Java source code.

Other common obfuscation techniques include the following:

  • Layout obfuscations modify the layout structure of the program by two basic methods: renaming identifiers and removing debugging information. They make the program code less informative to a reverse engineer. Most layout obfuscations cannot be undone because they use one-way functions such as changing identifiers by random symbols and removing comments, unused methods, and debugging information. Though layout obfuscations cannot prevent reverse engineers to understand the program by observing the obfuscated code, they at least consume the cost of reverse engineering. Layout obfuscations are the most well studied and widely used in code obfuscation. Almost all Java obfuscators contain this technique.
  • Control obfuscations change the control flow of the program. The trick is simple: For a routine A() obfuscator creates an additional routine A_bug and an "if" selector, if (PREDICATE) then A_bug(); else A();. The PREDICATE is designed on-the-fly in that way so it is always false (but it's made so it's hard to conclude that fact), and the A() routine is always selected instead of a buggy copy A_bug().
  • Data obfuscations break the data structures used in the program and encrypt literals. This method includes modifying inheritance relations, restructuring arrays, etc. Data obfuscations thoroughly change the data structure of a program. They make the obfuscated codes so complicated that it is impossible to recreate the original source code.

Conclusion

You should keep in mind that no obfuscator known today provides any guarantees on the difficulty of reverse engineering. Thus, obfuscators do not provide security of a level similar to modern encryption schemes, and you should used with other measures in tandem in cases where security is of high importance.

The most common software reverse engineering attacks target copy protection schemes. These schemes generally rely heavily on existing operating system procedure calls, making it easy to bypass basic code obfuscation using the same tools used with unobfuscated code. In addition, obfuscated code often depends on the particular characteristics of the platform and compiler, making it difficult to manage if either change.

Peter V. Mikhalenko is a Sun certified professional who works for Deutsche Bank as a business consultant.

---------------------------------------------------------------------------------------

Get Java tips in your inbox Delivered each Thursday, our free Java newsletter provides insight and hands-on tips you need to unlock the full potential of this programming language. Automatically subscribe today!
16 comments
Justin James
Justin James

I have to wonder what obfuscation does to the speed of the program, in many cases. Personally, I think that anyone who decompiles the code probably is not going to do too much with it. I mean, it is twenty times easier to just pirate software. Other than looking for security holes (and we all know that obscurity is hardly a defense against hacking), I just do not see decompiling code being a major issue. I suppose someone could demand it, but I cannot think of too manmy cases where it would be needed. J.Ja

javatech
javatech

I have found obfuscation to be almost indispensable, though not yet quite good enough. Specifically, we use it to make sure that only a very small number of developers have access to key classes that control data security. While everyone has access to the resulting class files, the source code remains offline. If other programmers don't know how these classes work, although they have access to the cryptographic API, it is much harder for them to find a way around it. I see no problem with obfuscation if used at the right time for the right purpose.

Locrian_Lyric
Locrian_Lyric

We used to call this crap "BAD PROGRAMMING". This is nothing more than a new wrinkle to the old practice of making your code illegible so nobody can maintain it. As far as it goes for me, I can live with it. It ups my asking price and gives me job security. It's a bad practice though and will likely make anyone who has to maintain this crap VERY agitated. To the suits that means YOU WILL HAVE TO PAY MORE FOR US.

Dmitry Leskov
Dmitry Leskov

To achieve the highest degree of protection, take a two-step approach: 1. Rename identifiers and encrypt string literals, but don't obfuscate code. 2. Compile the resulting classes down to native code using GCJ or Excelsior JET.

bblackmoor
bblackmoor

Code obfuscation is both immoral and futile. It is a hallmark of poor programmers that they waste time worrying about people "stealing" their code, rather than on improving that code and making it as good as it can be.

Tony Hopkinson
Tony Hopkinson

your code with a simple cut and paste. Of course this idea relies of several idiotic assumptions The code is valuable. The code is reusable as it is. The code is desirable.... If any of the above were true, obfuscation will not stop someone reverse engineering. The muppets who obfuscation would deter, probably wouldn't understand it anyway even in plain 'english'

apotheon
apotheon

It sounds like, in translation, your reason for "liking" obfuscation is that you work with crappy programmers who can't code their way out of a wet paper bag, so you're better off keeping them in the dark to limit the amount of damage they can do. That's not a good reason for obfuscating source code. It's a good reason for firing some programmers and replacing them with people who aren't idiots, though.

apotheon
apotheon

. . . stop wasting your time "obfuscating", and spend more of that time actually improving the software. You're not going to end up stopping anyone that wants the design of your software -- and there are far fewer people who actually want access to the internals than most companies realize.

chigozie_onyeuko
chigozie_onyeuko

Where is the best place to hide a colpse? It is in the cemetry of corse. So to make things easier and for coder and writer of soft ware not to have sleepless nights, the best they can do is not to code the soft wares at-all.. Or they will spend sleepless nights yet have their efforts wasted as they will discover the next day that several 'smart guys' already have anti-dots to their labour and alredy have their so-protected soft ware on and running. Onyeuko Chigozie Teddy Pisa-Italy

mailtojava
mailtojava

poor programmer is worrying about the decompiler ..we are work very hard work and write the coding but simple hacker getting the our source code ....how its possiable man how u can say that ....what a think man can u give that correct soluation man ....

fuhrmeis
fuhrmeis

From the view of OpenSource it is likely a poor practice, but ClosedSource Software needs a specific level of security, regarding to "code-napping". Wich way is the best for it, I don't know yet. The purposes of improving code and obfuscating it are as different as compiling and executing it. So what's the fuss. Since when has been moral a priority in commercial products? Nevertheless, I am with you! an OpenSource addict

royhayward
royhayward

1. Obfuscation won't stop the motivated reverse engineer. 2. It doesn't improve the features, but will make bugs harder to find. 3. There are legitimate reasons to be able to reverse engineer your code. (Case in point: We had code written by someone on an H1B. They went back home and then never came back. After a bit, we discovered that the original source code had been lost, and working notes were in a foreign language. At that point we decompiled the code and reverse engineered it. If we had been using obfuscation the task would have been much harder.) 4. Obfuscation has always been available for all languages. But has also been little used due to the pace of change and development. The most this will do is make migrating to the next big language harder to do.

Locrian_Lyric
Locrian_Lyric

Yeah, it will stop "that" kind of coder... it just pisses everyone else off.

apotheon
apotheon

Jumping through dozens of hoops in the attempt to "better" obfuscate code so others can't "steal" it is ultimately futile, and a waste of energy. If you depend on code that's closed source for business reasons, you're better off defending it in court with copyright law, and letting your developers work on improving the software rather than trying to make it harder to access the source code. If you do so for security reasons, you're better off staying out of the software business altogether, since obscurity is not an effective means of ensuring security.

apotheon
apotheon

People working in shops where "closed source" is all they understand are often very myopic when it comes to understanding the security and value of their software. As royhayward pointed out, you can't stop a motivated developer from reverse-engineering your software, and obfuscation won't help. Furthermore, there are often legitimate reasons for such activities, aside from "stealing" (which is a ridiculous term to use for copyright infringement) code, as royhayward also pointed out. Ultimately, all obfuscation does is make things more difficult for people who are supposed to have access to the source, like the developers.