Exploring Symbol Type Information with PdbXtract

Mandiant is introducing a new free tool today, PdbXtract™,
which allows you to browse and search PDB-type information.
PdbXtract allows you to explore symbolic type information as
extracted from Microsoft PDB files. This tool is primaril…

Mandiant is introducing a new free tool today, PdbXtract™,
which allows you to browse and search PDB-type information.

PdbXtract allows you to explore symbolic type information as
extracted from Microsoft PDB files. This tool is primarily designed
for reverse engineering Windows-based applications and for exploring
the internals of Windows kernel components. You can download PdbXtract.

A programming database (PDB) file is a binary file containing
program debug information in a Microsoft-proprietary format. This
file is produced by the compiler/linker when a program is built. The
information it contains is used by debuggers to debug a program and
can greatly assist a developer in debugging program issues by
resolving function pointers to symbolic names, for example.

Perhaps the most useful and richest source of debugging
information contained in PDBs is type data which holds detailed
information about data structures, constants, and other named
symbols. While this information is primarily used to debug program
components, it can also be used to gain insight into how core
operating system components work by observing both the format of the
data structure and how the structure is used.

PdbXtract is
not a pure PDB parser. It only extracts type information using
Microsoft’s DebugInterface Access (DIA) COM. If you are interested
in just parsing/dumping raw PDB information, there are a few
alternatives out there to DIA, including Volatility’s open source
pdbparse (http://code.google.com/p/pdbparse/)
or the PDB utility that comes with the Undocumented Windows 2000
Secrets book. However, most of the practical tools I have seen that
operate on PDB’s use DIA, including Microsoft’s own Dia2dump, this
one http://www.codeproject.com/Articles/37456/How-To-Inspect-the-Content-of-a-Program-Database-P
and this one http://www.ishani.org/web/articles/obsolete/pdb-cracking-tool/,
to name a few. To reiterate, PdbXtract does not parse or capture the
wealth of other information available in a PDB, including:
functions, debug streams, modules, publics, globals, files, section
information, injected sources, source files, OEM specific types,
compilands, and others.

The tools mentioned above are fine
for inspecting the contents of a single PDB. However, often times as
part of my job in R&D, we have to use knowledge of type
information across all supported Windows operating systems to
implement features. For example, if you are dealing with partially
undocumented or "opaque" types (example: you need to walk
the PEB’s InInitializationOrderModuleList to obtain a list of loaded
modules in a process) or have full source type information but do
not want to tie your program to a specific version of those types as
implemented in the headers of the SDK you are compiling against, you
probably want to just use static offsets such as:

PVOID NextMod=(PVOID ((DWORD_PTR)PebPtr+InInitOrderModList_Offset+Flink_Offset);

The problem has always been: how do I get the value of
InInitOrderModList_Offset for all platforms we support, taking into
account 32-bit/64-bit variations? The answer has always been:
useWinDbg (or if you are interested in possibly-correct kernel
symbols only, you might use Matt Suiche’s Moonsols library (http://msdn.moonsols.com )).
Launch a VM for each OS you want to support, attach with a debugger,
and use the power of WinDbg to extract the type information. Well,
WinDbg’s magical "dt" command just relies on the PDB
information for the corresponding binary (after retrieving the
necessary symbol files from your local symbol store and optionally
the Microsoft public symbol server), so it stands to reason that we
should be able to do the same. The end goal is to make a searchable
database for all the exported types of OS binaries we care about, so
that we don’t have to constantly relive the tedium of doing this in
WinDbg.

Features

PdbXtract has two main features: exploring a
single PDB (PDB Explorer) and searching a library of PDBs for one or
more operating systems. PDB Explorer opens the PDB, parses type
information using DIA, and displays a list of all structs, unions
and enums. If you click on one of the types, a C-style struct (with
offsets) definition will be displayed in the text area to the right,
as shown below for the type IMAGE_FILE_HEADER.

The library tab allows you to create and
search a library of PDB type information. I have created a library
for *most* of the operating systems we support for the following
important system binaries: kernel (ntoskrnl.exe, ntkrpamp, ntkrnlpa,
ntkrnlmp),ndis.sys, win32k.sys and hal (hal.dll, halaacpi.dll,
halmacpi.dll). You might ask why other system DLLs were not
included, such askernel32, user32, advapi32, etc.The answer to that
being the corresponding PDBs for those binaries you get off the
symbol server are stripped of type information. Why? Because
Microsoft expects you to use their headers when you compile your
application, and thus your program’sPDB will have the
necessary type information.

The library included with
PdbXtract covers several of Microsoft’s major operating system
releases, but you can easily add more symbols. PdbXtract includes a
utility, called PdbFetch, that simply runs Microsoft’s symchk
utility to grab the symbols for the file names you supply (usage:
pdbfetch, where is a text file that contains a list of full paths to
system binaries you want to retrieve symbols for). Pdbfetch creates
a "PDB set" which consists ofthe directory structure with
containing PDBs as created by symchkplus a manifest.xml file which
summarizes the OSplatform information. To use a PDB set in
PdbXtract, go to the library tab and click "new" if you
want to create a new library from the PDB set or "add" if
you want to add them to an existing library. Once you create/add a
PDB set to a library you can delete them – the only thing that
matters is the sqlite .pdbx database that’s created.

Perhaps
someone out there will find this useful and maybe create a
searchable web front end with the resulting SQLite database? The sky
is the limit. Let me know if you do by commenting below.

As a
final note, you might wonder why you can’t just download the entire
symbol packages from Microsoft, which include every symbol file on
the MS Symbol server, and create a ginormous library. Why is there a
requirement to acquire the PDBs using pdbfetch? The answer is – you
could do that – but it is data overload(several GB of PDBs) when you
will not care about 99% of them. Plus it is easier to capture OS
platform/build info at run time rather than guessing at it from the
name of the symbol package installer (PDBs give no indication of
what OS the corresponding binary originated from).


Print Share Comment Cite Upload Translate
APA
() » Exploring Symbol Type Information with PdbXtract. Retrieved from https://www.truth.cx/2012/04/24/exploring-symbol-type-information-with-pdbxtract-2/.
MLA
" » Exploring Symbol Type Information with PdbXtract." - , https://www.truth.cx/2012/04/24/exploring-symbol-type-information-with-pdbxtract-2/
HARVARD
» Exploring Symbol Type Information with PdbXtract., viewed ,
VANCOUVER
- » Exploring Symbol Type Information with PdbXtract. [Internet]. [Accessed ]. Available from: https://www.truth.cx/2012/04/24/exploring-symbol-type-information-with-pdbxtract-2/
CHICAGO
" » Exploring Symbol Type Information with PdbXtract." - Accessed . https://www.truth.cx/2012/04/24/exploring-symbol-type-information-with-pdbxtract-2/
IEEE
" » Exploring Symbol Type Information with PdbXtract." [Online]. Available: https://www.truth.cx/2012/04/24/exploring-symbol-type-information-with-pdbxtract-2/. [Accessed: ]
Select a language: