Plan
managed modules
execution of managed code
metadata
deployment and assemblies
managed modules
code is compiled in managed modules
PE executables
contain headers (PE and CLR)
metafata
IL code
the pe and clr headers
PE header
PE32 or PE32+ format
gui, cui or dll
timestamp
ignored for IL only modules
CLR header
required version of CLR
flags
MethodRef for entry point
location and size of metadata, resources
strong name
metadata overview
set of data tables describing what is defined in the module
additional information about what the module references
all metadata is always associated (embedded in) with the
module (unlike IDL or TLB)
uses of metadata
no headers and library files needed
Visual Studio uses md for IntelliSesnse
CLR verification process
serialization
garbage collection
IL overview
stack based high level (object-oriented) assembly language
CPU-independent, no registers
object oriented features -- can instantiate objects (newobj
instruction)
call virtual methods (callvirt), work with members (ldfld, stfld)
special purpose instructions for some types -- arrays (ldelem)
type independent arithmetic instructions -- add, mul
instruction for loading and storing (constants, indirect, local
variables, arguments), eg. ldstr, ldarg, ldloc, stloc
branching, labels, exceptions handling
modules and assemblies overview
CLE actually works with assemblies
assemblies are assembled from one or more module files
assembly is what we would call a component
loading the CLR
CLR is loaded by the so called runtime host (native process)
typically -- the windows shell, ASP.NET, Intenet Explorer
there is an API to load the runtime into a process and run
managed code (COR API)
the runtime must be installed (MSCorEE.dll is present in the
system directory)
version of the runtime -- registry, CLRVer utility
windows examines the header and creates the appropriate
process type
windows loads the appropriate version of MSCorEE.dll
the primary thread runs the function in MSCorEE.dll, that
does initialization, loads the EXE assembly and jumps to the
entry point function in it
similarly if a process calls LoadLibrary with a dll assembly
executing assembly code
all types in a method are scanned
the type tables are created
when a method is called, the IL is found (using metadata),
verified, compiled and stored, the pointer stored in the table
only one performance hit by the first call
both the il and native code can/may not be optimized
(unoptimized code mainly for debugging -- the nops in code)
why JIT compilation can be faster
target platform can be determined at run time (CPU specific
instructions)
certain tests can be allays false on the target platform
JITter could profile the execution of the code and reorganize
and the recompile the code
verification
verification examines the IL code and checks if the code is safe
it simulates every possible control flow and verifies the stack
every method is called with correct number of parameters
that parameters have proper types
return values are used properly
based on metadata for methods
verification allows more applications (AppDomains) to run in
one process
unsafe code
any code containing embedded native code, unmanaged
pointers, methods returning managed pointers etc.
it is not verified -- verification is denied or skipped (if the
appropriate permission is set)
PEVerify.exe utility
CTS and CLS quick revision
the application (assembly) consist of modules
each consist of types
types consist of members (fields, methods, properties and
events)
all have visibility -- types in the assembly (public or internal)
members in the type and assembly (private, protected,
protected and/or internal, public)
CTS defines rules for inheritance, virtual methods, object
life-time
the language code and types behavior are to be considered
separate (see C++ multiple inheritance)
the single root of hierarchy as a basic rule for inheritance
CLS for language interoperability (some constaints)
on a very basic level all members are fields and methods
Unmanaged code interoperability
PInvoke -- functions in native dlls can be called directly
(using the DLlImport atribute), must define all structures and
datatype (StructLayout attribute)
managed code can use existing COM components (tlbimp
utility)
COM components can use managed code (tlbexp and regasm
utilities)
a very rich topic -- marshalling of types etc.
.NET framework design goals
windows have been considered "unstable"
dll hell
instalation complexity
security problems
bulding types into modules
basic command line : csc /t:exe /r:MsCorLib.dll
Program.cs
/t switch -- module type (exe, winexe, library, module)
/r switch -- referenced assemblies (MsCorLib.dll
authomatically)
common switches in the response file use csc @respfile
file.cs
default -- local CSC.rsp and global CSC.rep response files
Metadata
each module contains metadata
definition and reference tables
definition tables
ModuleDef, TypeDef, MethodDef, FieldDef, ParamDef,
PropertyDef, EventDef
reference tables
AssemblyRef, ModuleRef, TypeRef, MemberRef
use IlDasm to inspect metadata
assemblies
modules are combined into assemblies, typically one managed
module per assembly
one module is considered primary -- it contains special
metadata called manifest
assembly defines reusable types
assembly is marked with version number
assembly can have security information associated
benefits of multimodule assemblies -- incremental download,
adding resources and datafiles, different prgramming
languages in one assembly
Manifest
one PE file in the assembly contains the assembly metadata,
this file is loaded first by the CLR
AssemblyDef, FileDef, ManifestResourceDef,
ExportedTypesDef
the manifest states that the file is a part of the assembly, the
modules do not reference the assembly
AL
to create assembly use the csc compiler or al assembly linker
modules are compiled with /t:module switch
they have .netmodule extensions and are PEs of dll type
/addmodule switch add a module to the assembly created
al combines the modules and creates a manifest only module
assemblies and resources
al /embed or /link switches
csc /resource and linkresource switches
/win32res switch
version resource information
resource information is added to the assembly
assembly level attributes eg.
[assembly:AssemblyFileVersion("1.0.0.0")]
AssemblyInfo.cs in Visual Studio
version resources : AssemblyFileVersion,
AssemblyInformationalVersion, AssemblyVersion (relevant to
CLR)
Major version, Minor version, Build number, Revision number
culture
assemblies containig code should have neutral culture
satellite assemblies -- contain only resources
use al to create (/embed and /c switches)
in code use ResourceManager object
use [assembly:AssemblyCulture("en-US")] in code
two kinds of deployment
private -- in one installation directory, weak name (just file
name)
global deployment -- assemblies identified by strong name,
stored in Global Assembly Cache (strong name)
global suitable for sharing, violates simple installation goal
strong names and the SN utility
sn consists of
file name (without extension)
version number
culture
public key
sn utility creates a private public key pair : sn -k file.keys
sn -p keysfile pukeyfile -- use to extract the public key
sn -tp pubkeyfile -- use to view public key
public key token -- 64 bits hash of public token
to sign the assembly use /keyfile switch
the GAC and the Gacutil utility
GAC path -- c:\Windows\Assembly
the gacutil utility
gacutil /i -- instal assembly
gacutil /u -- uninstall assembly
gacutil /l -- list assemblies
delayed signing
if you do not have the private key use /delaysign and public
key file instead
use sn -Vr AssemblyName so that you can install the
assembly in the GAC
use sn -R assembly keyfile to sign the assembly with the
private key
resolving type references
IL refers to a member
IL refers to a type
TypeRef indicates ModuleRef, AssemblyRef or ModuleDef
if ModuleRef or ModuleDef - load the type from the
appropriate module (file)
if AssemblyRef
if weakly named - search the AppBase
if strongly named - search the GAC and then the AppBase
load the manifest file and its ExportedTypesDef
Memory management
The program resources consume memory
they are stored on the thread's stack or in the managed heap
every type is a resource
the lifetime of a resource
1 new memory is allocated (newobj)
2 it is initialized (.ctor)
3 resource is used by the application
4 tear down the state (Dispose pattern)
5 free the memory (Garbage collector)
Advantages over native programming
no need to worry about the size
no need to worry about freeing the memory
so no ugly memory bugs
but a lot of resources still need to be closed by hand --
system handles (files, tokens etc.)
New objects creation
CLR allocates resources on the managed heap
it is similar to the C-runtime heap, but it is managed
completely by CLR
when a process is initialized CLR reserves a contiguous re of
addresses in the memory
CLR maintains a pointer (NextObjPtr), initially set to the
base address of the region
when newobj instruction is called, the CLR
1 calculates the memory required for the types and its base fields
2 add bytes needed for object overhead (8 bytes for 32 bit 16
bytes for 64 bit environment)
3 checks if there is enough of free memory on the heap
4 if so the memory starting the NextObjPtr is zeroed out, the
type constructor is called (using NextObjPtr as this) the
calculated type size is added to NextObjPtr the object address
is returned
Advantages over C-runtime heap
allocating memory means simply adding to a pointer (in C the
linked list of records must be walked)
objects are created in the contiguous manner (in C the
consecutively created object can be separated by megabytes of
memory
if you create object with strong relationship consecutively, it
can improve performance (FileStream and BinaryWriter)
but there must be a mechanism to ensure that there is always
enough of free space garbage collection
Garbage collection
a mechanism to find objects no longer needed by the
application and reclaim their memory
is usually executed, when there is not enough memory on the
heap (after newobj call the object size + NextObjPtr is an
address not in the reserved region)
if there is not enough memory after the garbage collection
ends, exception is thrown (OutOfMemoryException)
Garbage collection
application has a set of roots storage locations containing a
memory pointer to a reference type object (can be null)
local variables, static fields and method parameters of
reference types are roots
when garbage collection is started it walks the stack
determining roots in all the methods tables
mark phase then it iterates through roots and marks all
objects referenced by them (following in-object references
recursively, it does not mark objects twice)
compact phase all not marked objects memory is reclaimed,
others are shifted down in memory to keep the heap compact
all roots references are updated to the shifted addresses
NextObjPtr is updated accordingly
Garbage collection
performance hit, but occurs only when generation 0 is full
the lifetime of an object is fully managed by CLR
no leaks, no access to freed memory
no memory fragmentation
the object referenced by the local variable does not live until
the end of the method
in debugged code JIT makes the lifetime longer
Finalization
last meal for the object before it is killed
used for freeing the unmanaged resources (file and other
operating system handles, network resources)
when the garbage collector determines that the object is
garbage it first calls the method
the C++ destructor syntax is used (classname)
the compiler emits the Finalize method and a try catch block
in it that calls the base objects Finalize method
Finalization
Finalization occurs when
generation 0 is full
GC.Collect() method is called
Windows is reporting low memory
CLR unloads an appdomain
CLR shuts down
special thread is used, timeouts are used in some cases
(unloading the CLR)
the GC maintains the Finalization list (objects with Finali
method)
during collection the collected objects from the Finalization
list are moved to Freachable queue the references are
considered roots
during the next garbage collection the object are removed
from the queue and their Finalize methods are called
the algorithm
CLR GC is a Generational Garbage Collector
assumptions:
the newer object, the shorter lifetime
the older object, the longer lifetime
collection a part of the heap is faster then collecting the whole
administrative control and publisher policy
when the process starts the heap is empty
the budget size for generation 0 is selected (256 kB CPU L2
cache)
budget size is selected for generation 1 (say 2 MB)
budget size is selected for generation 2 (say 10 MB)
all newly allocated objects are in generation 0
when the allocation surpasses its size GC is started
algorithm continued
object that survived are moved to generation 1 only generation
0 is collected until the generation 1 budget size is surpassed
if generation 1 is full it is also collected and surviving object
are moved to generation 2
only three generations are supported (0,1,2)
the GC is self tunning (the smaller the budget size, the mor
frequent GC)
e.g. the size of generation 0 can be halved if the lifetime of
objects is very short
if all generation 0 objects are garbage, the memory is freed
only by subtracting form NewObjPtr