|
Introduction
Recently there has been a sudden increase in new development projects for migrating existing C/C++ applications written on the UNIX/Windows platform to .NET. There are a variety of reasons for that, foremost among them being the desire to utilize the wide range of capabilities the .NET Framework offers, such as Web Services, XML support, seamless database access infrastructure using ADO.NET, built-in features like security, versioning, etc. But the task of converting huge systems involving heterogeneous components interacting with each other is easier said than done. This article aims to describe:
- Some practical issues involved in migrating to the .NET platform, especially focusing on applications that interact with DB2 mainframes
- The features and capabilities available in the .NET Framework for solving the issues
- Architecture of the solution and the actual implementation
Target Audience
This article assumes that the readers are familiar with the basics of the .NET Framework, C# and some knowledge of standard C++.
Motivation
This article describes the issues involved in processing huge binary data received from DB2 mainframes on the .NET platform. The issues we will address are:
- Providing a mechanism to extract raw binary data as big as 1 MB
- Utilizing the existing C++ structures to binary data mappings
- Wrapping the existing C++ data structure mappings into managed code in the .NET platform, thereby exposing the entire functionality in language-independent format
- How to execute stored procedures in DB2 that return more than 8KB of data using the DB2 providers for the .NET platform
Features and Capabilities in .NET
Now let's try to understand in brief the features and capabilities available in the .NET platform that we will use for achieving our objectives.
Managed/Unmanaged C++
Unmanaged/Standard C++
Standard C++ is one of the most widely used languages in the software industry, especially in applications where performance and flexibility are of utmost importance. Most systems software development has been done in C++. Unmanaged C++ in the Microsoft world is nothing but standard C++ used in the developer community. It has all the capabilities of the standard C++ and cool features like multiple inheritance and templates, which are not available in other .NET languages like C# and VB.
Managed C++
Since so much code has already been written in standard C++ and is still being written, managed C++ provides the mechanism to harness the full power of standard C++, along with all the capabilities and features available in the .NET Framework. In a sense it provides the best of both worlds to C++ developers. It is the only language that allows mixing managed C++ code and unmanaged C++ code in the same library or assembly. Managed C++ is the point where both worlds meet. It's a very powerful tool available in the .NET platform. It allows developers to expose their legacy code as managed code on the .NET platform seamlessly.
Some of the advantages of managed C++ are:
- Allows mixing the managed and unmanaged code in the same file
- Allows developers to keep performance-sensitive functionality as native C++ code and exposing it via wrapper managed C++ code.
- Allows creation of objects and access functions written in unmanaged libraries like standard C++ dlls.
- The managed C++ compiler generated IL code has the best performance among all the .NET language compilers.
Managed DB Providers for .NET
A classic .NET managed data provider is used for connecting to a database, executing commands, and retrieving results. The .NET data providers are designed to be lightweight, creating a minimal layer between the data source and your code, increasing performance without sacrificing functionality.
There are several managed DB providers available for .NET from Microsoft, ADO.NET itself provides the following three database client namespaces:
- OLE DB Managed Provider Interface - This is the most generic interface provided by .NET, allowing connection to any database that has an OLE DB interface. If we use this interface there is a limitation on the maximum size of any parameter in stored procedure call, i.e no parameter can be more than 8 KB in size. This component resides in System.Data.OleDb namespace.
- Database specific Client Managed Provider Interface - These providers are specific to database vendors, like Microsoft's SQL Server specific .NET component residing in System.Data.SqlClient namespace or Oracle's .NET component residing in System.Data.OracleClient namespace. Similarly DB2 also provides a .NET component but its still in BETA stage.
- ODBC .NET Managed Provider Interface - ODBC is the lowest layer in the ADO.NET hierarchy. This is the most flexible and vendor-independent interface provided by .NET. It's intended to work with all ODBC drivers, allowing connection to any database. Also there is no limitation on the size of the parameters. The classes for this component reside in the System.Data.Odbc namespace. For our DB2 mainframe connectivity we used the ODBC.NET managed provider interface.
Since ODBC is designed for maximum interoperability, there is a performance hit compared to database-specific client managed providers, which are optimized for a specific databse.
So it might be better to use a database-specific client managed provider if there is one available to meet your requirements.
Architectural Overview of the Solution
By now we have a good idea regarding the issues and capabilities of .NET. Let's use an example scenario to describe the architecture and the solution.
In our sample application, customers can place their orders (be it for airline tickets, books, phone service etc.) from any front-end application:
- Web site
- Customer representative on the phone
- Hand-held devices
- Virtual Call Center applications
All ordering information is finally stored in the mainframe DB2 database.
Later on customers can query their account information, using any of the applications mentioned above, which internally retrieve the information from the mainframe legacy database.
The following architecture diagram depicts the various components involved in retrieving/modifying the data for different types of requests.
Retrieving information from the DB2 mainframe
Retrieving information from the DB2 mainframe involves the following steps:
1. Executing the DB2 stored procedure:
Since our DB2 stored procedure returns the results in raw format of sizes greater than 8KB, we had to use OdbcProvider namespace classes. Following is the sample code:
String GetDB2Data()
{
private string mDB2ConnString
= "Provider=IBMDADB2.1;DSN=DB2V71;UID=db2admin;PWD=db2admin";
private string mDBSchema = "SampleSchema";
protected OdbcConnection mDB2Conn = new
OdbcConnection(mConnString);
// Build the stored procedure command.
string cmdBuilder = "CALL " + mDBSchema + ".SampleSP(?,?,?);";
OdbcCommand cmd = new OdbcCommand();
cmd.CommandType = CommandType.StoredProcedure;
cmd.CommandText = cmdBuilder;
cmd.Connection = mDB2Conn;
cmd.Connection.Open();
cmd.ExecuteNonQuery();
string lOutputData = cmd.Parameters[2].Value.ToString();
return lOutputData;
}
In the above function we created a DB2 connection to the DB2 mainframe database, and using OdbcProvider classes, we executed the stored procedure and retrieved the result as an output parameter. The output of this function is the raw data, which we will cast to a particular C/C++ data structure in the next section.
2. Casting the received raw data to static C/C++ mapping structures
The C++ structures to which the raw data maps were already available and were being used in the existing C++ code running on the UNIX platform. Since neither C# nor managed C++ allows explicit raw casting, we use unmanaged C++ to do the cast for us.
The following is the sample C/C++ structure to which our raw data maps:
namespace SampleUnManagedStructs{
typedef struct
{
char orderNum[9];
char status[1];
char db2_act[5];
char db_name[8];
char sql_cd[3];
char ordr_ind[1];
char addrInd[1];
ComplexStruct complex_struct;
AddressStruct blgAddrData;
} SampleOrderStatusStruct;
typedef struct
{
char orderNum[9];
char status[1];
char db2_act[5];
char db_name[8];
char sql_cd[3];
char ordr_ind[1];
char addrInd[1];
char param1[1];
char param2[1];
char filler1[23];
AddressStruct blgAddrData;
} ComplexStruct;
typedef struct
{
char orderNum[9];
char status[1];
char city[5];
char street[28];
char state[3];
char po_box[1];
char zipCode[1];
...
...
}AddressStruct;
}; // namespace SampleUnmanagedStructs
Note that for simplicity sake I have left many fields in these structures. In the above structures, SampleOrderStatusStruct is a nested structure containing two other structures named ComplexStruct and AddressStruct, which themselves could be nested.
Casting is a very powerful feature available in the C language. It performs all the memory mapping of the entire raw data in one shot. Of course it needs to be done only in situations where the mappings are known to work before hand.
The following code snippet shows the code for casting the raw data received from DB2 to the above structure. All this code would be in the Managed C++ assembly where managed and unmanaged C++ code can reside together:
Void castRawDataToUnmangedStruct(String* rawData)
{
1 mStruct = new SampleUnManagedStructs::SampleOrderStatusStruct()
2 char* charData ;
3 charData = (char*)(void*)Marshal::StringToHGlobalAnsi(rawData);
4 mStruct =(SampleUnManagedStructs::SampleOrderStatusStruct*)(charData);
}
In the above code, line 3 is the key where the string data received from .NET is marshaled to char pointer data.
In line 4 the char buffer is cast into our C++ structure by using C style explicit cast. At this point all the raw data received from DB2 is properly aligned on memory boundaries of the structure. From now on we can simply access the specific data by referring to the specific field in the structure.
3. Wrapping the existing C++ data structure mappings into managed .NET code
Now that we have seen our raw data is aligned on proper boundaries of our C structure, we will define a managed C++ wrapper class that can expose the structure member attributes in a language-independent manner. The following is our sample wrapper class:
__gc class ManagedSampleOrderStatusWrapper
{
private:
// attribute to stored the unmanaged structure in the class
SampleUnManagedStructs::SampleOrderStatusStruct* mpStruct;
Private:
// method to cast raw data to unmamanged structure.
void castRawDataToUnmangedStruct(String* rawData);
public:
// Constructor that does the actual casting
ManagedSampleOrderStatusWrapper(String* rawData);
~ ManagedSampleOrderStatusWrapper () { delete mpStruct; }
// All the get methods for exposing the attributes of the C
// structs including the nested structures.
String* GetOrderNumber();
...
...
...
};
In the above wrapper class we make the unmanaged C structure as the member attribute of the wrapper class. The actual casting is done within the constructor by calling private method castRawDataToUnmanagedStruct().
Also all the member attributes of the structure including the nested structure are accessible via get methods.
Since the structures in real life have hundreds of member attributes, all the accessor method code was generated using Perl scripts.
Perl scripts are very useful for code generation with minimal effort. Perl provides many constructs for manipulating/parsing the input.
Sample Input Code:
char OrderNum[9];
char Status[1];
char City[5];
char Street[28];
char State[2];
char PO_BOX[1];
char ZipCode[9];
The following is a sample Perl script for auto generation of the managed C++ code:
Perl Script Code:
#!/usr/local/bin/perl -W
$sourceFile = "AddressStruct";
open(STRUCTURES, $sourceFile) or die "Can't open structure: $!\n";
$hfile = $sourceFile . ".h";
if (open(HFILE, ">".$hfile))
{
print HFILE <<"HFILEEND" ;
/* -*- C++ -*-
This file was generated by the script MgdGetMethods.pl
Edit that instead of this.
*/
#using
#include "SampleUnmanagedStructs.h"
using namespace System;
using namespace System::Runtime::InteropServices;
using namespace SampleUnManagedStructs;
namespace ManagedSampleStructs
{
public __gc class Managed$sourceFileWrapper
{
HFILEEND
}
else
{
print "Can't open $hfile\n";
}
while ($line=)
{
($varType, $varName) = split(/[ ;]/ , $line);
#get all arrays
if ($varName =~ /\[.*\]/ ){
#weed out arrays with more than 1 dimension
if (!($varName =~ /\]\[/ )){
(@var_name_components) = split(/\[/, $varName);
#remove trailing ']'
$newName = $var_name_components[0];
chop(@var_name_components);
#get number of elements in the array
$num_elements = $var_name_components[1];
}
}
$num_elements = eval($num_elements);
print "$newName: $num_elements \n";
print HFILE " String* get_$newName() \n";
print HFILE " {\n";
print HFILE "return new String(mpStruct->$newName,0,$num_elements); \n";
print HFILE " }\n";
}
print HFILE " \n};\n}\n}\n} ";
Here the input to the Perl script is a file containing a C struct, and the output is the get methods for all the attributes of the struct.
Sample output code generated from Perl script:
/* -*- C++ -*-
This file was generated by the script MgdGetMethods.pl
Edit that instead of this.
*/
#using
#include "SampleUnmanagedStructs.h"
using namespace System;
using namespace System::Runtime::InteropServices;
using namespace SampleUnManagedStructs;
namespace ManagedSampleStructs
{
public __gc class ManagedSampleStructsWrapper
{
String* get_OrderNum()
{
return new String(mpStruct->OrderNum,0,9);
}
String* get_Status()
{
return new String(mpStruct->Status,0,1);
}
String* get_City()
{
return new String(mpStruct->City,0,5);
}
String* get_Street()
{
return new String(mpStruct->Street,0,25);
}
String* get_State()
{
return new String(mpStruct->State,0,2);
}
String* get_PO_BOX()
{
return new String(mpStruct->PO_BOX,0,1);
}
String* get_ZipCode()
{
return new String(mpStruct->ZipCode,0,1);
}
}
}
4. The C# Business Logic Component
The Business logic can now access any of the information received from DB2 by simply calling get methods on the ManagedSampleOrderStatusWrapper wrapper class. The managed wrapper class hides all the gory details from the implementer of business logic. The C# component can then expose the entire logic in any form it chooses to, be it via a Web Service interface or just a simple interface via assembly.
Conclusion
In this article we demonstrated how to manipulate and expose raw data received from mainframes into well-defined managed C++ interfaces using the facilities and features provided by the .NET Framework and managed C++. Also by keeping the performance-sensitive functionality as native C++ code, we can utilize the inherent speed advantage of the C++ compilers.
About the Author
Satender Saroha is a consulting software architect currently working at Verizon Communications. Satender has more than seven years of experience in architecting and designing distributed systems on Microsoft and UNIX platforms. He has worked in diverse fields ranging from Voice over IP soft switches, SS7, speech recognition software, embedded systems, IN Platforms to Web Ordering systems capable of handling millions of hits a day. He is an expert in distributed systems technologies like CORBA and .NET using C++ and C#.
He holds a Bachelor's degree in Computer Science Engineering from Indian Institute of Technology (IIT), Roorkee, India. He can be reached via ssaroha@yahoo.com.
References
1. Professional C# 2nd edition, WROX Publications
2. Design Patterns By Erich Gamma et al, Addisson Wesley
3. Modern C++ Design by Andrei Alexandrescu, Addison Wesley
4. Learning Perl, O'Reilly and Associates
|