asp tutorials, asp.net tutorials, sample code, and Microsoft news from 15Seconds
Data Access  |   Troubleshooting  |   Security  |   Performance  |   ADSI  |   Upload  |   Email  |   Control Building  |   Component Building  |   Forms  |   XML  |   Web Services  |   ASP.NET  |   .NET Features  |   .NET 2.0  |   App Development  |   App Architecture  |   IIS  |   Wireless
 
Pioneering Active Server
 Power Search








Active News
15 Seconds Weekly Newsletter
• Complete Coverage
• Site Updates
• Upcoming Features

More Free Newsletters
Reference
News
Articles
Archive
Writers
Code Samples
Components
Tools
FAQ
Feedback
Books
Links
DL Archives
Community
Messageboard
List Servers
Mailing List
WebHosts
Consultants
Tech Jobs
15 Seconds
Home
Site Map
Press
Legal
Privacy Policy
internet.commerce














internet.com
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

HardwareCentral
Compare products, prices, and stores at Hardware Central!

Intellectual Property Protection and Code Obfuscation
By Adnan Masood
Rating: 3.5 out of 5
Rate this article


  • email this article to a colleague
  • suggest an article

    Table of Contents

    Introduction

    "Who can afford to do professional work for nothing? What hobbyist can put three man-years into programming, finding all bugs, documenting his product, and distributing it for free?"
    -Bill Gates in his "Open Letter to Hobbyists," 1976. Excerpt from Free as in Freedom: Richard Stallman's Crusade for Free Software

    The recent Microsoft Windows source code leak has raised serious concerns in security intellectual property protection circles. Software, being an intangible yet highly valuable commodity, is now indigenous to organisations in various forms. But its theft and disassembling is more dangerous to an organisation than any other replaceable entity being burgled. Its theft or leakage, in part or in full, impacts an organization's credibility and increases the risk of exploitation of bugs, which may be found inside the code and improperly gives a leading edge to competitors.

    It's rare that a proprietary OS's source code gets posted on the Internet, but if this worries you, how about distributing a program, which retains most or all of the information present in original source? With constant advancements in operating systems architectures, we are now living in a world where major platforms, .NET and J2EE, rely on virtual machines (VM) to facilitate the generation of intermediate code, which could be executed on any machine, in principle. The slogan of 'Write Once Run Anywhere' sounds very attractive but considering the code is exposed through a virtual machine, there are various security measures that need to be taken. Since the intermediate language can be disassembled back into source, your highly valued commodity is in danger. In this article, I'll discuss the potential perils in the VM arena, how virtual machines work, what code obfuscation is, how open source reacts to intellectual property and what the steps in execution of a CLR based program are. In this article we'll also discuss how .NET's Reflection APIs work and how we can read a Portable Executable using it. Let us explore these topics in detail. Welcome to the uncharted waters of .NET.

    A virtual machine, as its name depicts, emulates a hardware machine by using software. However, the architecture is not bound to any physical machine but instead is supported through an interpreter, which executes the code. A VM provides a security "sandbox" to protect the underlying resources. The idea of a VM is not new; it dates as far back as 1965 with Andrew Tenenbaum and the IBM VM emulation, which is now IBM System 370 (S/370) and IBM System 390 (S/390). This chronology is discussed by Tennenbaum and William Stallings as well as at GMU's Web site under the history of Virtual Machines. Knuth's famous MMIX is another example of 64 bit RISC VMs used in his three volumes "The Art of Computer Programming", a classical text in computer science.

    Figure: Schematic Architecture of a Virtual Machine

    This idea was widely and commercially publicized with Sun's advent of Java and the company's slogan of "write once run anywhere". Therefore the JVM (Java Virtual Machine) has become the standard mode of virtual machine-based execution. To have its code deployed and executed on different machines with disparate architectures and different processor types without any extra effort is a developer's dream come true. This idea was widely accepted, and although there was criticism on the basis of program speed and code vulnerability, Java kept thriving with its machine-independent byte code and JVM.

    Code exposure is native to the architecture of the virtual machines, java classes or .NET assemblies. Code isn't compiled into machine code but rather into an intermediate form, which is later executed by the virtual machine. This intermediate language mnemonics contain much information about the original source and can nearly be transformed back to the original source code. To provide portability, this intermediate step can't be avoided, and this is where reverse engineering gets easier. Since the Java language is considered to be the established forerunner with its uses of a virtual machine, there is a large amount of text available on the architecture of JVM and on code obfuscation, which is the main topic of this article. Therefore, I'll focus on the .NET platform. For further reading about the JVM, refer to the references section.

    In Microsoft's .NET Framework, the fundamental unit of deployment and execution is assemblies. They consist of all managed type resources combined. Managed simply means code that can be executed by the CLR or targets the Common Language Runtime. There are various benefits of using managed code, for instance, automated memory management, garbage collection, thread management, type safety, etc., but this is beyond the scope of this article. Managed code provides the metadata that helps disassemblers reverse engineer the intermediate language code and extract the original source.The Microsoft counterpart of byte code is MSIL or Microsoft Intermediate Language. It's Microsoft's implementation of ECMA's Common Intermediate Language (ECMA, the European Computer Manufacturers Association; a European trade organization that issues its own standards and is a member of the ISO). The subtle differences between each standard is defined in Don Box's Essentials .NET, which is well worth reading.

    Shared Source Common Language Infrastructure (Rotor)
    This discussion will not be completed without mentioning shared source CLI, codenamed Rotor. Shared Source CLI is the archive of implementation source code of ECMA C# and CLI specifications. It's a free, open source version of the .NET Framework and C# compiler distributed by Microsoft. It's supporting systems include FreeBSD, Mac OS X and off course Windows.

    More information on Rotor can be found on MSDN and its release could be downloaded from .

    Figure: Steps in compilation of a .NET source file.

    MSIL is converted into machine code using a JIT (Just In Time) compiler prior to its invocation. A JIT compiler embodies one half of the two common execution models. The first (pre-compiling) works by generating a memory image of the complied source, whereas a JIT compilation causes effective memory paging, as only necessary components get loaded in memory instead of the whole code. It also provides interoperability and portability of code by extending its scope to dissimilar architectures.

    Following is the MSIL (IL in short) source code for the HelloWorld Program.

    /* IL Code for HelloWorld.exe */

    .assembly HelloWorld

    {

        .ver 1:0:0:0 /*The Assembly Version */

    }

     

    .module HelloWorld.exe  //Hello World Module declaration

     

    //Class declaration

    .class public auto ansi HelloWorld extends [mscorlib]System.Object

    {

        .method static void  HelloWorld() //Static Method declaration

        {

            .entrypoint

            ldstr "Hello World."          // Loading the string

            call void [mscorlib]System.Console::WriteLine(class System.String) //Calling static method to print string

            ret

        }

    }

    Listing: HelloWorld.il

    If the reader is familiar with assembly language for the 8086, it's pretty much like it; ldstr reminds me of the accumulator register and the language's 1-1 mapping with machine code. Here it's used to load a hard coded string into memory, which is later printed using the System namespace's static method, WriteLine. IL language can be written in any text editor and compiled using ilasm.exe, which comes with the .NET Framework. Ilasm.exe or IL assembler generates a PE i.e. Portable Executable coded file from MSIL source, as can be seen in the screenshot below.

    Figure: Compilation of IL using ILASM.exe

    There is a collection of command line (and GUI based) tools available with the .NET Framework. You may find the complete list useful for reference purposes. Also, detailed information and the specifications of the MSIL are available here at MSDN. To execute the HelloWorld.exe file created by ILASM.exe, one has to write the filename as shown below.

    Figure: Executing HelloWorld.exe

    After reading Simon Robinson's Advanced .NET Programming, I wrote an IL program and compiled it using ILASM.exe. I found myself as excited as I was when I first used TASM (Turbo assembler) or MASM (Microsoft assembler), or when I coded inline assembly in Turbo C++ 3.0 in a University lab to change the monitor resolution by calling an interrupt. While ILASM is exciting, there is an ILDASM too, the IL Dissembler, the bad guy. In the next example we will see how to disassemble a VB .NET program.

    Imagine an ideal world where nothing is lost in translation, where everyone speaks the same language or speaks the same second language, to be more precise. MSIL and the MS .NET platform is the ideal world. CSC is the command line compiler for C# and vbc for visual basic .NET, both translating the source code into MSIL to be executed on the CLR.

    '//Importing the system class

    imports system

    '//Delcaring HelloWorld Namespace

       Namespace HelloWorld

    '//Delcaring HelloWorld Class

        Class HelloWorld

          SharedSub Main

             Console.WriteLine("HelloWorld from VB") '//Calling Static WriteLine function

          EndSub

        endclass

    endnamespace

    Listing: HelloWorld.vb

    The simple and self-evident code above just prints a string, HelloWorld, from VB on a console screen. To compile this code, we use vbc.exe, which comes with the .NET Framework.

    Executing HelloWorld.

    This process may appear mundane, and you might be wondering what the whole point of this trivial exercise is? Go to run (or Visual Studio .NET command prompt) and type ILDASM. Providing your path is set right, the following utility will run.

    Figure: ILDASMing the HelloWorld.Exe

    On opening HelloWorld.exe, which was just generated from HelloWorld.vb, you can see that its source code is pretty much exposed. In the left pane, namespace, class and functions are listed, which can be further explored in detail. ILDASM uses different icons to manifest modules and their corresponding members. After retrieving this much information from a deployment module, which was considered almost gibberish in a pre-VM era, translating back to source code is completely possible. Namespace declaration, class signature, and function definition can all easily be explored using ILDASM or various third party decompilers, Lutz Roeder's .NET Reflector, Salamander decompiler, and Anakrino to name a few. A detailed listing can be found in the references section at the end of this article.

    To understand the needs of code obfuscation, it is also important to comprehend what metadata a Portable Executable holds and how it gets used. As defined before, the basic unit of resources in .NET is an assembly. An assembly contains

    • A Portable Executable (mandatory)
    • Any number of optional Portable Executable modules
    • Any number of Optional Resource files

    Portable Executable format isn't new either, but has been with us since the evolution of Win32. It's an extended version of Unix COFF (common object file format) introduced in Unix System V. Later, Executable and Linkable format (ELF) made PEs deprecated in UNIX. Microsoft's specification of Portable executables and common file format is a reference for this topic. Also, at the end I've provided various links for further study of Portable Executable file format's structure, its verification and validity, vulnerabilities and formal specification.

    Roughly speaking, a Portable Executable has the following file format; header and COFF text, which is further divided into various sections shown below.

    Table: CLR Module Format

    To explore an assembly (a PE, a DLL), the .NET framework provides Reflection APIs, which are used to find out type definitions at runtime. They provide different aspects of types definition at design time and runtime. These APIs could be classified into two genres. On MSDN, reflection is defined as

    "The System.Reflection.Emit namespace contains classes that allow a compiler or tool to emit metadata and Microsoft intermediate language (MSIL) and optionally generate a PE file on disk. The primary clients of these classes are script engines and compilers.

    The System.Reflection namespace contains classes and interfaces that provide a managed view of loaded types, methods, and fields, with the ability to dynamically create and invoke types."

    To demonstrate the reflection API, here's a simple example. In this code, I've declared an integer and then initialized an object of class Type which holds the type of x, which is int32. This depicts that the type of a variable could be discovered at runtime. Also, even when it is casted to a higher hierarchy, i.e. Object, it still returns the same, System.Int32. Last but not least, I instantiated an object of class instanceRetriever, i.e the class itself and tried to get its type. Reflection API returned instanceRetriever.

    using System;

    using System.Reflection;

    class instanceRetriever

    {

       publicstaticvoid Main(String[] args)

      {

          int x= 1;

          Type t = x.GetType();

          Console.WriteLine(t.Name);

          Object obj = x;

          Console.WriteLine(obj.GetType().ToString());

     

          Console.WriteLine(new instanceRetriever().GetType().ToString());

       }

    }

    Listing: InstanceRetriever.cs

    Figure: Running InstanceCreater

    This dynamic recognition of type is useful in late binding and on the fly code generation and execution. In the next detailed example, I'll demonstrate through a C# application how to open an assembly and read its methods and types without using ILDASM. It's like writing a simpler version of ILDASM. We'll call this PEManifest or Portable Executable Manifestation Engine.

    Eric Lippert writes in his Visual Basic Security Handbook:

    "Source code ends up in hands of outsiders in many ways. In the more security conscious era, an increasing number of customers are demanding individual independent review of source code. It would be sub opened if you are sued or fall in hand of attacker if they successfully attack"

    PEManifest >>

  • Rate This Article
    Not HelpfulMost Helpful
    1 2 3 4 5
    Supporting Products/Tools
    AspEncrypt
    Built around the Microsoft CryptoAPI, AspEncrypt helps you harness all major encryption and hashing algorithms such as DES, Triple-DES, RC2, RC4, RSA, MD5 and SHA1 in just a few lines of code. The component can be used in tandem with AspEmail to send encrypted and signed mail in the industry-standard S/MIME format, or with AspUpload to encrypt files as they are being uploaded. AspEncrypt can also be used to issue and manage X.509 digital certificates.
    [Top]
    AspPDF
    AspPDF is an ASP/ASP.NET component which enables generation and management of documents in PDF format. Features include advanced text formatting, font embedding, form fill-in, images, tables, content and page extraction, document stitching, encryption, digital signatures, and more.
    [Top]
    Other Articles
    Feb 3, 2005 - ASP.NET Mixed Mode Authentication
    In many web applications it is desirable for both intranet users and external parties to be able to seamlessly log onto the system. The problem this raises is that it is not easy to allow intranet users to log in via Windows integrated authentication while also allowing external parties to log in to the same application using standard forms authentication. This article will show you one way to achieve the best of both worlds when it comes to authentication.
    [Read This Article]  [Top]
    Dec 8, 2004 - Designing Role-Based Security Models for .NET
    In this article, Michele Leroux Bustamante discusses authentication, authorization and role-based security in .NET. Along the way, he provides some best practices for implementing role-based security in some typical .NET application scenarios including rich clients, Web applications, and Web services.
    [Read This Article]  [Top]
    May 11, 2004 - SharePoint Security and .NET Impersonation
    When implementing custom components that require access to restricted resources, implicit impersonation must be used. Jay Nathan shows how to create a class that makes using .NET Impersonation a snap.
    [Read This Article]  [Top]
    Feb 24, 2004 - How to Send Secure Mail in ASP-Based E-Commerce Applications - Part II
    Businesses that utilize encrypted e-mail may find Secure Multipurpose Internet Mail Extensions (S/MIME) to be somewhat restrictive. This article shows how to use security features in PDF as an alternative to S/MIME.
    [Read This Article]  [Top]
    Feb 2, 2004 - Fighting Spambots with .NET and AI
    Bill Gates, in a recent interview, predicted the end of spam by 2006. One of the methods he mentioned involved a challenge only a real live person could handle. Adnan Masood shows how to use AI and .NET to create a user verification scheme that incorporates similar concepts Gates alluded to.
    [Read This Article]  [Top]
    Jan 21, 2004 - Configuring .NET Code Access Security
    Code Access Security (CAS) is the .NET Framework security model that grants code permission to resources based on "evidence" pertaining to the encapsulating assembly. In this article, David Myers examines CAS and explains different configuration methods.
    [Read This Article]  [Top]
    Mar 10, 2003 - Platform Neutral and Transparent Encryption of Sensitive Customer Information
    Zhenlei Cai combines an open source C++ encryption library with SQL Server extended stored procedures to create a platform neutral, transparent encryption solution that resides at the database layer.
    [Read This Article]  [Top]
    Jan 15, 2003 - Exploring Machine.Config - User Security and More
    Christopher Spann offers a .NET configuration tip that should help ease system administrators' fears of security compromise and thus assuage growing developer demand for a .NET environment.
    [Read This Article]  [Top]
    Dec 10, 2002 - Encrypting Cookie Data with ASP.NET
    You don't have to be a cryptography expert or spend lots of money on third-party components to secure sensitive data in .NET. In this article, Wayne Plourde shows just how easy it is to encrypt cookie data using encryption classes in the .NET System.Security.Cryptography namespace.
    [Read This Article]  [Top]
    Aug 21, 2002 - Web Application Error Handling and Logging For ASP
    One of the most important aspects of an application is how well it responds to the user, and this includes response to errors. In this article, Adam Tuliper shares techniques for catching ASP errors and shows how to create a notification system that is sure to keep customers at bay.
    [Read This Article]  [Top]
    Mailing List
    Want to receive email when the next article is published? Just Click Here to sign up.

    Support the Active Server Industry



    JupiterOnlineMedia

    internet.comearthweb.comDevx.commediabistro.comGraphics.com

    Search:

    Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

    Jupitermedia Corporate Info


    Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

    Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers