<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.matthias.vallentin.net/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
  <id>http://matthias.vallentin.net/</id>
  <title>Matthias Vallentin</title>
  <updated>2011-12-16T08:00:00Z</updated>
  <link rel="alternate" href="http://matthias.vallentin.net/" />
  
  <author>
    <name>Matthias Vallentin</name>
    <uri>http://matthias.vallentin.net/contact</uri>
  </author>
  <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.matthias.vallentin.net/blog-atom" /><feedburner:info uri="blog-atom" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
    <id>tag:matthias.vallentin.net,2011-12-16:/blog/2011/12/high-level-network-traffic-profiling/</id>
    <title type="html">High-Level Network Traffic Profiling</title>
    <published>2011-12-16T08:00:00Z</published>
    <updated>2011-12-16T08:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/LKmVF_IScDk/" />
    <content type="html">&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2011/12/bro-workload.png" alt="Profiling Bro's event workload" style="float: right; margin-left: 10px; margin-bottom: 10px" align="right"&gt;

&lt;p&gt;The size of an event is given by the size of all its arguments (plus a little
bit of meta information, e.g., creation timestamp). How do we get such meta
data though? To date, Bro has limited meta programming support, which makes it
difficult to obtain these details. Thus I added a new &lt;em&gt;meta event&lt;/em&gt; to Bro,
which the core generates in addition to each regular event. Here is an example
that shows how to handle this event:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;event&lt;/span&gt; &lt;span class="n"&gt;meta_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;time&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s"&gt;%f\%d"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find a more complete example that writes information about all events
to a separate file &lt;code&gt;meta.log&lt;/code&gt; in my &lt;a href="https://github.com/mavam/brospects/blob/master/bro/meta.bro"&gt;brospects&lt;/a&gt; repository. To use
this meta programming functionality, you need to use the topic branch
&lt;code&gt;topic/matthias/meta-analysis&lt;/code&gt;. A fresh install might look like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git clone --recursive git://git.bro-ids.org/bro.git
git checkout topic/matthias/meta-analysis
./configure --prefix=PREFIX  make  make install
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To give a real world example of what you can do with this functionality, let us
analyze a trace captured at UC Berkeley (UCB) on October 17, 2011, at 2:35pm
for 10 minutes. At UCB&amp;#x2019;s network border, a Bro cluster of 26 nodes monitors two
10 Gbps links. The full packet trace has a size of 219 GB and contains 284
million packets in 6.5 million connections. When looking at the trace sizes on
the 26 nodes, we observe a median size of 8.1 GB (&lt;img src="http://tex.72pines.com/latex.php?latex=%5Csigma%20=%202.2"/&gt;) in a range
from 6.3 to 15 GB.&lt;/p&gt;

&lt;p&gt;Let us inspect one 7.7 GB trace of a single cluster node in more detail. The
two Figures below show the number of events per second, once over time and once
their empirical distribution.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2011/12/blade00-events.png" alt="Events per second" style="display: block; margin: 0 auto;"&gt;&lt;/p&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/LKmVF_IScDk" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;When it comes to understanding your network traffic in terms of volume or packet rate, the obvious approach is to count packets or sum up their size. In this article, I describe a higher-level alternative to profiling network traffic based on events generated by the Bro network security monitor.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2011/12/high-level-network-traffic-profiling/</feedburner:origLink></entry>
  <entry>
    <id>tag:matthias.vallentin.net,2011-06-22:/blog/2011/06/better-nids-performance-by-tracking-roaming-users/</id>
    <title type="html">Better NIDS Performance by Tracking Roaming Users</title>
    <published>2011-06-22T07:00:00Z</published>
    <updated>2011-06-22T07:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/XUjO1F9X_GQ/" />
    <content type="html">&lt;p&gt;&lt;a href="http://codebutler.com/firesheep"&gt;Firesheep&lt;/a&gt; had a noticeable impact on several
large Web 2.0 players: Twitter, FaceBook, GitHub, Dropbox, Slicehost, and
others now offer either opt-in HTTPS or have switched to a full HTTPS
deployment in order to protect their users from session hijacking attacks. In
an &lt;a href="http://matthias.vallentin.net/blog/2010/10/taming-the-sheep-detecting-sidejacking-with-bro/"&gt;earlier article&lt;/a&gt; I described how to write a
&lt;a href="http://www.bro-ids.org"&gt;Bro&lt;/a&gt; script that passively detects Firesheep. In this
article, I discuss how tracking roaming users in DHCP-enabled networks can
improve the false positive rate of network intrusion detection system (NIDS)
detectors of this kind.&lt;/p&gt;

&lt;h1 id="passive-detection"&gt;Passive Detection&lt;/h1&gt;

&lt;p&gt;Passive monitoring is less intrusive than active approaches, such as the
&lt;a href="http://www.zscaler.com/blacksheep.html"&gt;Blacksheep&lt;/a&gt; Firefox extension which sends fake cookies and
looks whether a Firesheep user tries to utilize them, or the
&lt;a href="http://notendur.hi.is/~gas15/FireShepherd"&gt;FireShepherd&lt;/a&gt; command-line utility which simply floods the
network with zillions of fake cookies with the goal to crash Firesheep (and
thereby the browser). In general, such active &amp;#x201C;defense&amp;#x201D; mechanisms do not scale
to larger networks and may have even undesirable side-effects in affecting
their users. Therefore, I believe that purely passive detection mechanisms are
more attractive in practice.&lt;/p&gt;

&lt;p&gt;However, false positives are an issue with the passive sidejacking detector.
If a network uses DHCP, chances are that the same user receives a different IP
addresses at some point in time. But when the sidejacking detector has not yet
expired the cookie value associated with the old IP address and the cookie
reappears under a new IP address, a false alarm is generated. This happens due
to the lack of a crisp user definition: IP address are ephemeral in DHCP
networks and cannot be used to analyze user activity over time without taking
roaming into account.&lt;/p&gt;

&lt;h1 id="reducing-false-positives"&gt;Reducing False Positives&lt;/h1&gt;

&lt;p&gt;How should we ? The key insight is to keep track of all IP addresses
under which the same user appears, where a user is uniquely identified by its
MAC address. You might think that this is obvious, so why not always use MAC
addresses? For small coffeeshop networks, this would probably work just fine: a
typical deployment has a single broadcast domain and the NIDS can be part of
it. However, the issue is with larger networks, partitioned into several
broadcast domains, where traffic from access points is aggregated over a
hierarchy of switches. In such scenarios, the NIDS might not even see the MAC
addresses of the users because it is deployed upstream (usually at the border
or DMZ).&lt;/p&gt;

&lt;p&gt;With &lt;a href="http://www.bro-ids.org"&gt;Bro&lt;/a&gt;, it is possible to simply &lt;a href="http://www.icir.org/robin/papers/acsac05.pdf"&gt;ship&lt;/a&gt;
the missing layer-2 information to the machine on which the detector runs. This
works in monitoring infrastructures consisting of network-wide &lt;em&gt;sensors&lt;/em&gt; (or
&lt;em&gt;worker&lt;/em&gt; nodes) which can see and forward link-layer events to a central
&lt;em&gt;manager&lt;/em&gt; nodes that runs the scripts processing the events. Our &lt;a href="http://matthias.vallentin.net/papers/raid07.pdf"&gt;NIDS cluster
paper&lt;/a&gt; has more details on this architecture.&lt;/p&gt;

&lt;p&gt;Assuming that we can see the relevant broadcast traffic, how do we learn
IP-to-MAC mappings of roaming users?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
    &lt;p&gt;Clearly, we can learn IP-to-MAC mappings by monitoring &lt;strong&gt;DHCP&lt;/strong&gt; traffic. In
particular, the DHCP ACKs contain the binding between the two addresses, as
discussed in the &lt;a href="http://matthias.vallentin.net/blog/2010/10/taming-the-sheep-detecting-sidejacking-with-bro/"&gt;previous sidejacking article&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;I was pleased when Jordi Ros-Giralt from Reservoir Labs sent me a patch that
incorporates another type of broadcast traffic: &lt;strong&gt;ARP&lt;/strong&gt;. Similar to DHCP
ACKs, ARP replies also contain the mapping from IP to MAC address.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;Bro&amp;#x2019;s new &lt;a href="http://git.bro-ids.org/bro-scripts.git/blob_plain/HEAD:/roam.bro"&gt;&lt;code&gt;roam.bro&lt;/code&gt;&lt;/a&gt; script keeps track of these mappings in two
global tables:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;ip_to_mac&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="p"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;read_expire&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;alias_expiration&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;synchronized&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;mac_to_ip&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="kt"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;read_expire&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;alias_expiration&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;synchronized&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Event handlers for the &lt;code&gt;dhcp_ack&lt;/code&gt; and &lt;code&gt;arp_reply&lt;/code&gt; events populate these tables.
For example, here is the DHCP ACK handler:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;event&lt;/span&gt; &lt;span class="n"&gt;DHCP&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;dhcp_ack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dhcp_msg&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;addr&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dhcp_router_list&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lease&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;interval&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;serv_addr&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;local&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;$&lt;/span&gt;&lt;span class="n"&gt;yiaddr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;local&lt;/span&gt; &lt;span class="n"&gt;mac&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;$&lt;/span&gt;&lt;span class="n"&gt;h_addr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ip_to_mac&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;ip_to_mac&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;mac_to_ip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;mac_to_ip&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mergeable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  
    &lt;span class="k"&gt;add&lt;/span&gt; &lt;span class="n"&gt;mac_to_ip&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;One note for cluster users: the &lt;code&gt;&amp;amp;synchronized&lt;/code&gt; attribute makes sure that these
tables have the same content across all nodes in a cluster. The &lt;code&gt;&amp;amp;mergeable&lt;/code&gt;
attribute ensures that the address sets of the table &lt;code&gt;mac_to_ip&lt;/code&gt; are correctly
synchronized. That is, the local sets of the worker nodes are &lt;em&gt;merged&lt;/em&gt; at the
manager. You can think of it conceptually as creating the union of all sets
instead of just using the value of the last update that would override previous
updates. See our &lt;a href="http://matthias.vallentin.net/papers/raid07.pdf"&gt;NIDS cluster paper&lt;/a&gt; for more details about these
attributes. At this point I should mention that I factored the roaming logic
into a separate Bro script &lt;code&gt;roam.bro&lt;/code&gt; so that other scripts can also make use
of the provided data structures.&lt;/p&gt;

&lt;p&gt;Jordi&amp;#x2019;s patch also includes a new type of
&lt;a href="http://www.bro-ids.org/documentation/notices.html"&gt;Notice&lt;/a&gt;,
&lt;code&gt;SessionCookieRoamed&lt;/code&gt;, which is generated when the same MAC address reappears
under a different IP address. The Figure below shows the updated decision tree
of the sidejacking detector; red rectangles correspond to a Notice and round
nodes represent a decision.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2011/06/sidejack-decision-tree.png" alt="Sidejacking detector logic after Jordi's update." style="display: block; margin: 0 auto;"&gt;

&lt;p&gt;The latest version of the &lt;a href="http://git.bro-ids.org/bro-scripts.git/blob_plain/HEAD:/sidejack.bro"&gt;detector&lt;/a&gt; and the &lt;a href="http://git.bro-ids.org/bro-scripts.git/blob_plain/HEAD:/roam.bro"&gt;roaming
script&lt;/a&gt; are available from the &lt;a href="http://git.bro-ids.org/bro-scripts.git"&gt;Bro scripts git
repository&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id="summary"&gt;Summary&lt;/h1&gt;

&lt;p&gt;This article describes how to reduce false positives of network-based detectors
that base their notion of accountability on IP addresses. I show how flexible
cross-layer NIDS, like Bro, can make such detectors more robust by
incorporating link-layer context (i.e., MAC addresses). Although the running
example is sidejacking, this improvement works in all scenarios where IP-level
state is kept across TCP connections.&lt;/p&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/XUjO1F9X_GQ" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;&lt;a href="http://codebutler.com/firesheep"&gt;Firesheep&lt;/a&gt; had a noticeable impact on several
large Web 2.0 players: Twitter, FaceBook, GitHub, Dropbox, Slicehost, and
others now offer either opt-in HTTPS or have switched to a full HTTPS
deployment in order to protect their users from session hijacking attacks. In
an &lt;a href="/blog/2010/10/taming-the-sheep-detecting-sidejacking-with-bro/"&gt;earlier article&lt;/a&gt; I described how to write a
&lt;a href="http://www.bro-ids.org"&gt;Bro&lt;/a&gt; script that passively detects Firesheep. In this
article, I discuss how tracking roaming users in DHCP-enabled networks can
improve the false positive rate of network intrusion detection system (NIDS)
detectors of this kind.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2011/06/better-nids-performance-by-tracking-roaming-users/</feedburner:origLink></entry>
  <entry>
    <id>tag:matthias.vallentin.net,2011-06-15:/blog/2011/06/omni-take-two/</id>
    <title type="html">Omni, Take Two</title>
    <published>2011-06-15T07:00:00Z</published>
    <updated>2011-06-15T07:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/5qRGzZzycQI/" />
    <content type="html">&lt;p&gt;Did you know that 2 equals 1? Here is the &lt;a href="http://en.wikipedia.org/wiki/Mathematical_fallacy"&gt;proof&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(1) X = Y                         ; Given
(2) X^2 = XY                      ; Multiply both sides by X
(3) X^2-Y^2 = XY-Y^2              ; Subtract Y^2 from both sides
(4) (X+Y)(X-Y) = Y(X-Y)           ; Factor
(5) X+Y = Y                       ; Cancel out (X-Y) term
(6) 2Y = Y                        ; Substitute X for Y, by equation 1
(7) 2 = 1                         ; Divide both sides by Y
                -- "Omni", proof that 2 equals 1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you think this is too easy, here is another neat proof spotted by my friend
Avital Steinitz while teaching a Signals and Systems course.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Consider the function &lt;img src="http://tex.72pines.com/latex.php?latex=f%20:%20%5Cmathbb%7BR%7D%20%5Cto%20%5Cmathbb%7BC%7D"/&gt; given by the rule
&lt;img src="http://tex.72pines.com/latex.php?latex=f(t)%20=%20e%5E%7Bi2%5Cpi%20t%7D"/&gt;, also known as the phaser with frequency &lt;img src="http://tex.72pines.com/latex.php?latex=2%5Cpi"/&gt;. We
show that this function is in fact the constant function &lt;img src="http://tex.72pines.com/latex.php?latex=1"/&gt;:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p align="center"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=%5Cforall%20t%20%5Cin%20%5Cmathbb%7BR%7D:%20%0Af(t)%20=%20e%5E%7Bi2%5Cpi%20t%7D%20=%20%5Cleft(e%5E%7Bi2%5Cpi%7D%5Cright)%5Et%20=%201%5Et%20=%201."/&gt;&lt;/p&gt;

&lt;p align="right"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=%5Csquare"/&gt;&lt;/p&gt;

&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/5qRGzZzycQI" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;Do you think proving that 2 equals 1 is too easy? My friend Avital Steinitz showcases another neat proof which demonstrates that the phaser function can indeed be the constant function 1.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2011/06/omni-take-two/</feedburner:origLink></entry>
  <entry>
    <id>tag:matthias.vallentin.net,2011-06-14:/blog/2011/06/a-garden-variety-of-bloom-filters/</id>
    <title type="html">A Garden Variety of Bloom Filters</title>
    <published>2011-06-14T07:00:00Z</published>
    <updated>2011-06-14T07:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/qOnrbkuOOmw/" />
    <content type="html">&lt;p&gt;In this article, I explain how &lt;strong&gt;Bloom filters&lt;/strong&gt; work and introduce several
variations that evolved as a result of extensive academic treatment of this
topic. Moreover, I introduce my own implementation of these Bloom filters,
&lt;a href="http://matthias.vallentin.net/libbf/"&gt;libBf&lt;/a&gt;, a header-only &lt;a href="http://en.wikipedia.org/wiki/C%2B%2B0x"&gt;C++11&lt;/a&gt;
library that comes with a BSD-style licence.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Whenever you have a set or list, and space is an issue, a Bloom filter may be
a useful alternative.&lt;br&gt;
&amp;#x2013;Mitzenmacher&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;

&lt;p&gt;A &lt;a href="http://en.wikipedia.org/wiki/Bloom_filter"&gt;Bloom filter&lt;/a&gt; is a randomized
&lt;a href="http://www.pittsburgh.intel-research.net/people/gibbons/talks-surveys/Synopsis-Data-Structures-Gibbons-Matias.pdf"&gt;synopsis data structure&lt;/a&gt; that allows for set membership queries.
Its space-efficient representation comes at the cost of &lt;em&gt;false positives&lt;/em&gt;,
i.e., elements can erroneously be reported as members of the set. In practice,
the huge space savings often outweigh the false positives if kept at a
sufficiently low rate.&lt;/p&gt;

&lt;p&gt;Bloom filters have received a great deal of attention not only by the research
community but also in practice. For example, &lt;a href="http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/bloom_filter.h?view=log"&gt;Google Chrome&lt;/a&gt; uses a
Bloom filter to represent a blacklist of dangerous URLs. Each time a user is
about to navigate to new page, the corresponding URL is mangled, hashed, and
then compared to a local Bloom filter that represents the set of all malicious
URLs.  If the Bloom filter reports that the URL is in the set, the browser
sends the hash of the URL to the Safebrowsing server to confirm that the URL is
indeed malicious to compensate a potential false positive. That is, all checks
are performed locally, but when the user surfs to a malicious URL, an extra
round trip to the Safebrowsing server occurs.&lt;/p&gt;

&lt;p&gt;Another example is the squid web proxy which uses Bloom filters to represent
&lt;a href="http://wiki.squid-cache.org/SquidFaq/CacheDigests"&gt;cache digests&lt;/a&gt;, which
caching servers use to periodically exchange the objects they contain. There
are many more examples of Bloom filter applications, for instance in
peer-to-peer applications, routing protocols, &lt;a href="http://en.wikipedia.org/wiki/IP_traceback"&gt;IP
traceback&lt;/a&gt;, resource location, etc.
Broder and Mitzenmacher give a &lt;a href="http://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf"&gt;good survey of network applications&lt;/a&gt;. &lt;/p&gt;

&lt;h1 id="bloom-filters"&gt;Bloom Filters&lt;/h1&gt;

&lt;p&gt;Before we delve into the discussion, let us agree on some common notation.&lt;/p&gt;

&lt;h2 id="terminology"&gt;Terminology&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Universe &lt;img src="http://tex.72pines.com/latex.php?latex=U"/&gt;
&lt;/li&gt;
  &lt;li&gt;
&lt;img src="http://tex.72pines.com/latex.php?latex=N"/&gt; distinct items&lt;/li&gt;
  &lt;li&gt;
&lt;img src="http://tex.72pines.com/latex.php?latex=k"/&gt; independent hash functions &lt;img src="http://tex.72pines.com/latex.php?latex=h_1,%5Cdots,h_k"/&gt;
&lt;/li&gt;
  &lt;li&gt;Vector &lt;img src="http://tex.72pines.com/latex.php?latex=V"/&gt; of &lt;img src="http://tex.72pines.com/latex.php?latex=m"/&gt; cells, i.e., &lt;img src="http://tex.72pines.com/latex.php?latex=m%20=%20%5Clvert%20V%20%5Crvert"/&gt;
&lt;/li&gt;
  &lt;li&gt;Set:
    &lt;ul&gt;
      &lt;li&gt;
&lt;img src="http://tex.72pines.com/latex.php?latex=S%20=%20%5C%7Bx_1,%5Cdots,x_n%5C%7D"/&gt; where &lt;img src="http://tex.72pines.com/latex.php?latex=x_i%5Cin%20U"/&gt; and &lt;img src="http://tex.72pines.com/latex.php?latex=%5Clvert%20S%5Crvert%20=%20n"/&gt;
&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Multiset / Stream:
    &lt;ul&gt;
      &lt;li&gt;
&lt;img src="http://tex.72pines.com/latex.php?latex=%5Cmathcal%7BS%7D%20=%20%5C%7Bx_1,%5Cdots,x_n%5C%7D"/&gt; where &lt;img src="http://tex.72pines.com/latex.php?latex=x_i%5Cin%20U"/&gt; and
&lt;img src="http://tex.72pines.com/latex.php?latex=%7C%5Cmathcal%7BS%7D%7C%20=%20n"/&gt;
&lt;/li&gt;
      &lt;li&gt;
&lt;img src="http://tex.72pines.com/latex.php?latex=C_x%20=%20%5Cleft%5C%7B%20c_%7Bh_1(x)%7D,%5Cdots,c_%7Bh_k(x)%7D%20%5Cright%5C%7D"/&gt;
counters of &lt;img src="http://tex.72pines.com/latex.php?latex=x"/&gt;
&lt;/li&gt;
      &lt;li&gt;
&lt;img src="http://tex.72pines.com/latex.php?latex=f_x%20="/&gt; multiplicity (frequency) of &lt;img src="http://tex.72pines.com/latex.php?latex=x%5Cin%5Cmathcal%7BS%7D"/&gt;
&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Bloom filter estimate denoted by hat:
&lt;img src="http://tex.72pines.com/latex.php?latex=%5Cwidehat%7BS%7D,%20%5Cwidehat%7B%5Cmathcal%7BS%7D%7D,%20%5Cwidehat%7Bf%7D_x,%20%5Cldots"/&gt;
&lt;/li&gt;
  &lt;li&gt;Probability of a false positive (FP):
&lt;img src="http://tex.72pines.com/latex.php?latex=%5Cphi_P%20=%20%5Cmathbb%7BP%7D%5Cleft(x%5Cin%20%5Cwidehat%7BS%7D%20%5Cvert%20x%5Cnotin%20S%5Cright)"/&gt;
&lt;/li&gt;
  &lt;li&gt;Probability of a false negative (FN):
&lt;img src="http://tex.72pines.com/latex.php?latex=%5Cphi_N%20=%20%5Cmathbb%7BP%7D%5Cleft(x%5Cnotin%20%5Cwidehat%7BS%7D%20%5Cvert%20x%5Cin%20S%5Cright)"/&gt;
&lt;/li&gt;
  &lt;li&gt;Capacity &lt;img src="http://tex.72pines.com/latex.php?latex=%5Ckappa"/&gt;, i.e., is the maximum number of items a Bloom filter
can hold until a given &lt;img src="http://tex.72pines.com/latex.php?latex=%5Cphi_P"/&gt; can no longer be guaranteed&lt;/li&gt;
  &lt;li&gt;A Bloom filter is &lt;em&gt;full&lt;/em&gt; when then number of added items exceeds
&lt;img src="http://tex.72pines.com/latex.php?latex=%5Ckappa"/&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id="basic"&gt;Basic&lt;/h2&gt;
&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2011/06/bf-basic.png" alt="The basic Bloom filter devised by Burton Bloom. To insert an item x, we set the corresponding positions in the bit vector to 1" style="float: right; margin-left: 10px; margin-bottom: 10px" align="right"&gt;
Burton Bloom introduced the original Bloom filter in 1970, which we refer to as
the &lt;strong&gt;basic Bloom filter&lt;/strong&gt;. It uses a bit vector &lt;img src="http://tex.72pines.com/latex.php?latex=V"/&gt; with &lt;img src="http://tex.72pines.com/latex.php?latex=%7CV%7C%20=%20m%20=%20O(n)"/&gt;
and &lt;img src="http://tex.72pines.com/latex.php?latex=k"/&gt; independent hash functions &lt;img src="http://tex.72pines.com/latex.php?latex=h_1,%20%5Cdots,%20h_k"/&gt; that map
items in &lt;img src="http://tex.72pines.com/latex.php?latex=U"/&gt; to the range &lt;img src="http://tex.72pines.com/latex.php?latex=[m]%20=%20%5C%7B1,%5Cldots,m%5C%7D"/&gt;. (Unlike in the
implementation, we start at index 1 for the formal treatment.) All bits in
&lt;img src="http://tex.72pines.com/latex.php?latex=V"/&gt; are initialized to 0. To insert an item &lt;img src="http://tex.72pines.com/latex.php?latex=x%5Cin%20S"/&gt;, we set the
bits at positions &lt;img src="http://tex.72pines.com/latex.php?latex=h_1(x),%20%5Cldots,%20h_k(x)"/&gt; in &lt;img src="http://tex.72pines.com/latex.php?latex=V"/&gt; to 1. To test whether
an item &lt;img src="http://tex.72pines.com/latex.php?latex=q%5Cin%20U"/&gt; is a member of &lt;img src="http://tex.72pines.com/latex.php?latex=%5Cwidehat%7BS%7D"/&gt;, we examine the bits
at positions &lt;img src="http://tex.72pines.com/latex.php?latex=h_1(q),%5Cdots,h_k(q)"/&gt; in &lt;img src="http://tex.72pines.com/latex.php?latex=V"/&gt;. If any of these bits is 0
we report &lt;img src="http://tex.72pines.com/latex.php?latex=q%5Cnotin%20%5Cwidehat%7BS%7D"/&gt;. Otherwise we report that 
&lt;img src="http://tex.72pines.com/latex.php?latex=q%5Cin%20%5Cwidehat%7BS%7D"/&gt;, although there remains some probability that
&lt;img src="http://tex.72pines.com/latex.php?latex=q%5Cnotin%20S"/&gt;. This type of error is a &lt;em&gt;false positive&lt;/em&gt; (FP) and also
known as &lt;em&gt;Bloom error&lt;/em&gt; &lt;img src="http://tex.72pines.com/latex.php?latex=E_B"/&gt;. It occurs because other elements in
&lt;img src="http://tex.72pines.com/latex.php?latex=S"/&gt; also map to the same positions.&lt;/p&gt;

&lt;p&gt;To compute the probability of a Bloom error, we start off with an empty bit
vector &lt;img src="http://tex.72pines.com/latex.php?latex=V"/&gt; and insert an item. This is the same as independently (and
uniformly) choosing &lt;img src="http://tex.72pines.com/latex.php?latex=k"/&gt; bits and setting them to 1. Thereafter, the
probability that a certain bit in &lt;img src="http://tex.72pines.com/latex.php?latex=V"/&gt; is still 0 is &lt;/p&gt;

&lt;p align="center"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=%5Cleft(1%20-%20%5Cfrac%7B1%7D%7Bm%7D%5Cright)%5Ek."/&gt;&lt;/p&gt;

&lt;p&gt;Afer &lt;img src="http://tex.72pines.com/latex.php?latex=n"/&gt; insertions, the probability that a certain bit is 1 is &lt;/p&gt;

&lt;p align="center"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=1%20-%20%5Cleft(1%20-%20%5Cfrac%7B1%7D%7Bm%7D%5Cright)%5E%7Bkn%7D."/&gt;&lt;/p&gt;

&lt;p&gt;Testing for membership involves hashing an item &lt;img src="http://tex.72pines.com/latex.php?latex=k"/&gt; times. Thus the
probability of a Bloom error is&lt;/p&gt;

&lt;p align="center"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=%5Cbegin%7Bequation%7D%0A%5Cmathbb%7BP%7D(E_B)%20=%20%5Cleft(1-%5Cleft(1-%5Cfrac%7B1%7D%7Bm%7D%5Cright)%5E%7Bkn%7D%5Cright)%5Ek%0A%5Capprox%20%5Cleft(1%20-%20e%5E%7B-kn/m%7D%5Cright)%5Ek%0A%5Cend%7Bequation%7D"/&gt;&lt;/p&gt;

&lt;p&gt;For fixed parameters &lt;img src="http://tex.72pines.com/latex.php?latex=m"/&gt; and &lt;img src="http://tex.72pines.com/latex.php?latex=n"/&gt;, the optimal value &lt;img src="http://tex.72pines.com/latex.php?latex=k%5E*"/&gt; that
minimizes this probability is &lt;/p&gt;

&lt;p align="center"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=k%5E*%20=%20%5Carg%5Cmin_k%5C;%5Cmathbb%7BP%7D(E_B)%20=%20%0A%5Cleft%5Clfloor%5Cfrac%7Bm%7D%7Bn%7D%5Cln%202%5Cright%5Crfloor"/&gt;&lt;/p&gt;

&lt;p&gt;For &lt;img src="http://tex.72pines.com/latex.php?latex=k%5E*"/&gt;, we have hence &lt;img src="http://tex.72pines.com/latex.php?latex=E_B%20=%20(0.619)%5E%7Bm/n%7D"/&gt;. Moreover, for a desired FP
probability &lt;img src="http://tex.72pines.com/latex.php?latex=%5Cphi_P"/&gt; we can compute the number of required bits by
substituting the optimal value of &lt;img src="http://tex.72pines.com/latex.php?latex=k"/&gt;:&lt;/p&gt;

&lt;p align="center"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=m%20=%20-%5Cfrac%7Bn%5Cln%20p%7D%7B(%5Cln%202)%5E2%7D."/&gt;&lt;/p&gt;

&lt;h2 id="multisets"&gt;Multisets&lt;/h2&gt;

&lt;p&gt;A basic Bloom filter can only represent a set, but neither allows for querying
the multiplicities of an item, nor does it support deleting entries. We use the
term &lt;em&gt;counting Bloom filter&lt;/em&gt; to refer to variants of Bloom filters that
represent multisets rather than sets. In libBf, we call a counting Bloom filter
a basic Bloom filter with width parameter &lt;img src="http://tex.72pines.com/latex.php?latex=w"/&gt;.&lt;/p&gt;

&lt;h3 id="counting"&gt;Counting&lt;/h3&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2011/06/bf-counting.png" alt="Each cell in the counting Bloom filter has a fixed bit width w. To insert an item x, we increment the counters C_x. To remove an item y, we decrement its counters C_y" style="float: right; margin-left: 10px; margin-bottom: 10px" align="right"&gt;
In a &lt;a href="http://pages.cs.wisc.edu/~cao/papers/summarycache.html"&gt;counting Bloom filter&lt;/a&gt;, inserting an item corresponds to
incrementing a counter. Some variants also feature a decrement operation to
remove item from a set. As soon we allow for deletions we introduce false
negative (FN) errors. These occur when removing an item that itself was a FP.
The probability of a FN is bounded by &lt;img src="http://tex.72pines.com/latex.php?latex=O(E_B)"/&gt;.&lt;/p&gt;

&lt;p&gt;When we get the count of an item &lt;img src="http://tex.72pines.com/latex.php?latex=x%5Cin%5Cwidehat%7BS%7D"/&gt;, we compute its set of
counters &lt;img src="http://tex.72pines.com/latex.php?latex=C_x"/&gt; and return the minimum value as frequency estimate
&lt;img src="http://tex.72pines.com/latex.php?latex=%5Cwidehat%7Bs%7D_x"/&gt;. This query algorithm is also known as &lt;em&gt;minimum
selction&lt;/em&gt; (MS).&lt;/p&gt;

&lt;p&gt;There are two main issues with counting Bloom filters:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Counter overflows&lt;/li&gt;
  &lt;li&gt;Choosing &lt;img src="http://tex.72pines.com/latex.php?latex=w"/&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first problem exists when we the counter value reaches &lt;img src="http://tex.72pines.com/latex.php?latex=2%5Ew%20-%201"/&gt; and
cannot be incremented any more. The most natural thing to do is simply not
continuing to count rather than overflowing and restarting at 0. However, this
introduces &lt;em&gt;undercounts&lt;/em&gt;, which we also refer to as FNs.&lt;/p&gt;

&lt;p&gt;The second problem concerns the choice of the width parameter &lt;img src="http://tex.72pines.com/latex.php?latex=w"/&gt;. If we
choose &lt;img src="http://tex.72pines.com/latex.php?latex=w"/&gt; very large, the space gains of a Bloom filter diminish. There
will also be a lot of unused space, manifesting as unused zeros. If we choose
&lt;img src="http://tex.72pines.com/latex.php?latex=w"/&gt; too small, we will reach the maximum counter value to fast. Choosing
the right value is a difficult trade-off that also depends on the data. It has
been shown that for uniform distributions a value of &lt;img src="http://tex.72pines.com/latex.php?latex=w%20=%204"/&gt; works well.&lt;/p&gt;

&lt;h3 id="bitwise"&gt;Bitwise&lt;/h3&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2011/06/bf-bitwise.png" alt="The bitwise Bloom filter consists of l counting Bloom filters, each of which represent w_i orders of magnitude of the entire counter. This Figure illustrates a bitwise Bloom filter with w_i = 1." style="float: right; margin-left: 10px; margin-bottom: 10px" align="right"&gt;
The &lt;a href="ftp://ftp.cs.rochester.edu/pub/papers/theory/07.tr927.Bitwise_bloom_filter.pdf"&gt;bitwise Bloom filter&lt;/a&gt; is a combination of &lt;img src="http://tex.72pines.com/latex.php?latex=l"/&gt; counting Bloom
filters with bit vectors &lt;img src="http://tex.72pines.com/latex.php?latex=V_i"/&gt;, each of which have &lt;img src="http://tex.72pines.com/latex.php?latex=m_i"/&gt; cells,
&lt;img src="http://tex.72pines.com/latex.php?latex=k_i"/&gt; hash functions, and width &lt;img src="http://tex.72pines.com/latex.php?latex=w_i"/&gt; where &lt;img src="http://tex.72pines.com/latex.php?latex=i%5Cin%5C%7B0,%5Cdots,l-1%5C%7D"/&gt;.
It aims at solving both of the overflow and space problem of the counting Bloom
filter.&lt;/p&gt;

&lt;p&gt;To add an item &lt;img src="http://tex.72pines.com/latex.php?latex=x"/&gt;, we first look at the counters in the first level
&lt;img src="http://tex.72pines.com/latex.php?latex=V_0"/&gt;. If there is enough room (i.e., width) available, we simply perform
the increment.  If the counter overflows, we insert &lt;img src="http://tex.72pines.com/latex.php?latex=x"/&gt; into &lt;img src="http://tex.72pines.com/latex.php?latex=V_1"/&gt; and
remove it from &lt;img src="http://tex.72pines.com/latex.php?latex=V_0"/&gt;. In this fashion, the counter value is unbounded as
we can always add more levels. However, the item has to be hashed &lt;img src="http://tex.72pines.com/latex.php?latex=l"/&gt; times
with a total of &lt;img src="http://tex.72pines.com/latex.php?latex=%5Csum_%7Bi=0%7D%5E%7Bl-1%7D%20k_i"/&gt; hash functions.&lt;/p&gt;

&lt;p&gt;In order to read the counter of an item, we combine the binary representation
of all the levels. Let &lt;img src="http://tex.72pines.com/latex.php?latex=c_i"/&gt; be the counter value at level &lt;img src="http://tex.72pines.com/latex.php?latex=i"/&gt;. Then we
compute the counter value as&lt;/p&gt;

&lt;p align="center"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=C%20=%20%5Csum_%7Bi=0%7D%5E%7Bl-1%7D%20c_i%202%5E%7B%5Csum_%7Bj=0%7D%5Ei%20w_i%7D."/&gt;&lt;/p&gt;

&lt;h3 id="spectral"&gt;Spectral&lt;/h3&gt;

&lt;p&gt;The &lt;a href="http://theory.stanford.edu/~matias/papers/sbf_tech_report.pdf"&gt;spectral Bloom filter&lt;/a&gt; is an optimized version of the
counting Bloom filter. It consists of two extra algorithms in addition to MS
and introduces a more space-efficient data structure to represent counters.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Let us review the MS algorithm. When querying an item &lt;img src="http://tex.72pines.com/latex.php?latex=q%5Cin%20U"/&gt;, MS uses
the minimum counter value &lt;img src="http://tex.72pines.com/latex.php?latex=m_q%20=%20%5Cmin_i%5C;C_q"/&gt; as frequency estimate,
i.e., &lt;img src="http://tex.72pines.com/latex.php?latex=%5Cwidehat%7Bf%7D_q%20=%20m_q"/&gt;. Cohen and Matias claim that &lt;img src="http://tex.72pines.com/latex.php?latex=f_x%20%5Cle%20m_x"/&gt;
and &lt;img src="http://tex.72pines.com/latex.php?latex=%5Cmathbb%7BP%7D(%5Cwidehat%7Bf%7D_x%20%5Cneq%20m_x)%20=%20%5Cmathbb%7BP%7D(E_B)"/&gt; for all 
&lt;img src="http://tex.72pines.com/latex.php?latex=x%5Cin%20S"/&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The second spectral algorithm is an optimization for the add operation. When
adding an item &lt;img src="http://tex.72pines.com/latex.php?latex=x"/&gt; to the Bloom filter, the &lt;em&gt;minimum increase&lt;/em&gt; (MI)
algorithm only increments the minimum counter value(s)
&lt;img src="http://tex.72pines.com/latex.php?latex=%5Ctilde%7BC%7D_x%20=%20%5Cmin_i%5C;C_x"/&gt;. The rationale behind this is that &lt;img src="http://tex.72pines.com/latex.php?latex=m_x"/&gt; is
always the most accurate count, thus MI results in the fewest possible
increment operations. &lt;/p&gt;

    &lt;p&gt;Because not all counters are incremented on inserts, the effect of deletes
is significantly worse and the number of FNs becomes unbounded. Thus, the MI
algorithm should not be used when removing of items from a Bloom filter.
Cohen and Matias claim that &lt;img src="http://tex.72pines.com/latex.php?latex=E_B%5E%7BMI%7D%20=%20O(E_B)"/&gt; and that if &lt;img src="http://tex.72pines.com/latex.php?latex=x"/&gt; is drawn
uniformly from &lt;img src="http://tex.72pines.com/latex.php?latex=U"/&gt;, then &lt;img src="http://tex.72pines.com/latex.php?latex=E_B%5E%7BMI%7D%20=%20E_B/k"/&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The third algorithm is &lt;em&gt;recurring minimum&lt;/em&gt; (RM) and involves two Bloom
filters, &lt;img src="http://tex.72pines.com/latex.php?latex=V_1"/&gt; and &lt;img src="http://tex.72pines.com/latex.php?latex=V_2"/&gt;. The key insight behind RM is that items that
experience Bloom errors are less likely to have recurring minima counter
values. Cohen and Matias found empirically that this applies to
approximately 20% of the items. Such items with a unique minimum are
maintained in the second Bloom filter to reduce the discrepancy between
&lt;img src="http://tex.72pines.com/latex.php?latex=f_x"/&gt; and &lt;img src="http://tex.72pines.com/latex.php?latex=%5Cwidehat%7Bf%7D_x"/&gt;.&lt;/p&gt;

    &lt;p&gt;To query an item &lt;img src="http://tex.72pines.com/latex.php?latex=q%5Cin%20U"/&gt; according to the RM algorithm, we look first
into the first Bloom filter and check if &lt;img src="http://tex.72pines.com/latex.php?latex=q"/&gt; has a recurring minimum. If
so, we return the minimum counter value. Otherwise we look the minimum
counter value from the second Bloom filter, unless it is 0. If it is 0
(i.e., does not exist), we return the minimum counter from the first Bloom
filter.&lt;/p&gt;

    &lt;p&gt;Since all the items are inserted into the first bloom filter, the RM
optimization does at least as well as the MS algorithm, yet has usually
better error rates because a second filter holding fewer items is used for
items which experience higher error rates.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fancy data-structure takes &lt;img src="http://tex.72pines.com/latex.php?latex=N%20+%20o(N)%20+%20O(n)"/&gt; space, where &lt;img src="http://tex.72pines.com/latex.php?latex=n"/&gt; is the
number of distinct items and &lt;img src="http://tex.72pines.com/latex.php?latex=N%20=%20k%5Csum_%7Bx%5Cin%20S%7D%20%5Clceil%20%5Clog%20f_x%20%5Crceil"/&gt;.
For details, please refer to &lt;a href="http://theory.stanford.edu/~matias/papers/sbf_tech_report.pdf"&gt;the paper&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id="aging"&gt;Aging&lt;/h2&gt;
&lt;p&gt;A problem all the above Bloom filter variants is that they eventually fill up
over time when dealing with a large set or stream of data. This means that at
some point the Bloom filter becomes unusable due to its high error rates. There
exist various scenarios where one would like to &amp;#x201C;age out&amp;#x201D; old items that have
been added a long time ago. For example, we might want to estimate only recent
items or we have a very limited amount of space available.&lt;/p&gt;

&lt;p&gt;Although counting Bloom filters have a delete operation, it is often impossible
to retain old items in memory. Thus we do not know their counter positions in
the bit vector anymore, otherwise we would simply decrement their count. What
we want is a Bloom filter that has &lt;em&gt;sliding window&lt;/em&gt; semantics, as
illustrated by the Figure below.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2011/06/sliding-window.png" alt="In a sliding window scenario, an insert operation for a new item x_7 would ideally delete an old item x_0." style="display: block; margin: 0 auto;"&gt;&lt;/p&gt;

&lt;p&gt;To support a sliding window, we would like to have Bloom filter that acts like
a FIFO. In the following, we discuss two different Bloom filter flavors that
aim at providing this property.&lt;/p&gt;

&lt;h3 id="stable"&gt;Stable&lt;/h3&gt;

&lt;p&gt;The &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.1569"&gt;stable Bloom filter&lt;/a&gt; is essentially a basic Bloom filter with an
underlying bit vector with a fixed cell width &lt;img src="http://tex.72pines.com/latex.php?latex=w"/&gt;. However, counters do not
represent the multiplicities of the items but rather their age. Thus the
interface supports only set membership queries.&lt;/p&gt;

&lt;p&gt;To insert an item, we decrement &lt;img src="http://tex.72pines.com/latex.php?latex=d"/&gt; cells chosen uniformly at random.
Thereafter, we set the counters of all &lt;img src="http://tex.72pines.com/latex.php?latex=k"/&gt; cells to their maximum value of
&lt;img src="http://tex.72pines.com/latex.php?latex=2%5Ew%20-%201"/&gt;. &lt;/p&gt;

&lt;p&gt;Deng and Rafiei have shown that the fraction of zeros will eventually become
constant. When having reached this &lt;em&gt;stable point&lt;/em&gt;, the probability of a Bloom
error is&lt;/p&gt;

&lt;p align="center"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=%5Cphi_P%20=%20%5Cleft(%201%20-%5Cleft(%20%5Cfrac%7B1%7D%7B1+%5Cfrac%7B1%7D%7Bd(1/k-1/m)%7D%7D%20%5Cright)%20%5Cright)"/&gt;&lt;/p&gt;

&lt;h3 id="asup2sup"&gt;A&lt;sup&gt;2&lt;/sup&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="http://portal.acm.org/citation.cfm?id=1685986"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=A%5E2"/&gt; Bloom filter&lt;/a&gt;, also known as &lt;strong&gt;active-active buffering&lt;/strong&gt;,
provides another type of FIFO. It uses two single-bit vectors &lt;img src="http://tex.72pines.com/latex.php?latex=V_1"/&gt; and
&lt;img src="http://tex.72pines.com/latex.php?latex=V_2"/&gt; where &lt;img src="http://tex.72pines.com/latex.php?latex=%7CV_1%7C%20=%20%7CV_2%7C%20=%20%5Cfrac%7Bm%7D%7B2%7D"/&gt;. Unlike the spectral RM
algorithm, one Bloom filter is not a subset of the other, so an item can be in
either Bloom filter.&lt;/p&gt;

&lt;p&gt;To query for an item, we return true if &lt;img src="http://tex.72pines.com/latex.php?latex=q"/&gt; exists in either &lt;img src="http://tex.72pines.com/latex.php?latex=V_1"/&gt; or
&lt;img src="http://tex.72pines.com/latex.php?latex=V_2"/&gt;. To insert an item &lt;img src="http://tex.72pines.com/latex.php?latex=x"/&gt;, we simply return if it already exists in
&lt;img src="http://tex.72pines.com/latex.php?latex=V_1"/&gt;. Otherwise we insert it in &lt;img src="http://tex.72pines.com/latex.php?latex=V_1"/&gt; and test whether &lt;img src="http://tex.72pines.com/latex.php?latex=V_1"/&gt; has
reached its capacity. If it is full, we flush &lt;img src="http://tex.72pines.com/latex.php?latex=V_2"/&gt; and swap &lt;img src="http://tex.72pines.com/latex.php?latex=V_1"/&gt; and
&lt;img src="http://tex.72pines.com/latex.php?latex=V_2"/&gt;. Thereafter we insert the item in &lt;img src="http://tex.72pines.com/latex.php?latex=V_1"/&gt; (the old &lt;img src="http://tex.72pines.com/latex.php?latex=V_2"/&gt;).&lt;/p&gt;

&lt;p&gt;One advantage of the &lt;img src="http://tex.72pines.com/latex.php?latex=A%5E2"/&gt; Bloom filter is space-efficiency, since one bit
vector is always full. Let the subscript &lt;img src="http://tex.72pines.com/latex.php?latex=a"/&gt; denote the value of the
&lt;img src="http://tex.72pines.com/latex.php?latex=A%5E2"/&gt; Bloom filter. The probability of a Bloom error is&lt;/p&gt;

&lt;p align="center"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=%7B%5Cphi_P%7D_a%20=%201%20-%20%5Csqrt%7B1-%5Cphi_P%7D"/&gt;&lt;/p&gt;

&lt;p&gt;and the optimal value for &lt;img src="http://tex.72pines.com/latex.php?latex=k_a"/&gt; and &lt;img src="http://tex.72pines.com/latex.php?latex=%5Ckappa_a"/&gt; are:&lt;/p&gt;

&lt;p align="center"&gt;&lt;img src="http://tex.72pines.com/latex.php?latex=k_a%5E*%20=%0A%5Cleft%5Clfloor%20-%5Clog_2%5Cleft(1-%5Csqrt%7B1-%5Cphi_P%7D%5Cright)%20%5Cright%5Crfloor%0A%5Cqquad%0A%5Ckappa_a%5E*%20=%20%5Cleft%5Clfloor%20%5Cfrac%7Bm%7D%7B2k_a%5E*%7D%20%5Cln2%20%5Cright%5Crfloor"/&gt;&lt;/p&gt;

&lt;h1 id="libbf"&gt;libBf&lt;/h1&gt;

&lt;p&gt;As part of a class project for the course &lt;a href="http://www.cs.berkeley.edu/~satishr/cs270/"&gt;Combinatorial Algorithms and Data
Structures&lt;/a&gt; in Spring 2011 at UC
Berkeley, I decided to write a &lt;a href="http://en.wikipedia.org/wiki/C%2B%2B0x"&gt;C++11&lt;/a&gt;
&lt;strong&gt;lib&lt;/strong&gt;rary of &lt;strong&gt;B&lt;/strong&gt;loom &lt;strong&gt;f&lt;/strong&gt;ilters, &lt;a href="http://matthias.vallentin.net/libbf"&gt;libBf&lt;/a&gt;, which implements the
above discussed Bloom filters. &lt;a href="http://matthias.vallentin.net/course-work/cs270-s11.pdf"&gt;Slides&lt;/a&gt; of the
final presentation are also available; they go a little deeper into the
algorithmic details.&lt;/p&gt;

&lt;h1 id="related-work"&gt;Related Work&lt;/h1&gt;

&lt;p&gt;I only presented a few Bloom filter types in this article, but active research
in this field yielded many more variations. For example, the &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.8477"&gt;dynamic&lt;/a&gt;
and &lt;a href="http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf"&gt;scalable&lt;/a&gt; Bloom filter are two variants that grow dynamically
as soon as more items are added. Bloom filters can also be
&lt;a href="http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/cbf.pdf"&gt;compressed&lt;/a&gt;, e.g., when sending them over the network.
&lt;a href="http://www.siam.org/proceedings/alenex/2006/alx06_004akirsch.pdf"&gt;Distance-sensitive&lt;/a&gt; Bloom filters give more than a binary answer or
count when asking for an item: they also return if an item is close to another
item in the set. Finally, there exist &lt;a href="http://webee.technion.ac.il/~ayellet/Ps/nelson.pdf"&gt;Bloomier&lt;/a&gt; filters which
extend the set membership query model and counting notion to computations of
arbitrary functions.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Cuckoo_hashing"&gt;Cuckoo hashing&lt;/a&gt; is a related
space-efficient alternative to Bloom filters. Moreover, Adam Langley has some
&lt;a href="http://www.imperialviolet.org/2011/04/29/filters.html"&gt;interesting thoughts&lt;/a&gt;
on Golomb Compressed Sets.&lt;/p&gt;

&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/qOnrbkuOOmw" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;In this article, I explain how &lt;strong&gt;Bloom filters&lt;/strong&gt; work and introduce several
variations that evolved as a result of extensive academic treatment of this
topic. Moreover, I introduce my own implementation of these Bloom filters,
&lt;a href="/libbf/"&gt;libBf&lt;/a&gt;, a header-only &lt;a href="http://en.wikipedia.org/wiki/C%2B%2B0x"&gt;C++11&lt;/a&gt;
library that comes with a BSD-style licence.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2011/06/a-garden-variety-of-bloom-filters/</feedburner:origLink></entry>
  <entry>
    <id>tag:matthias.vallentin.net,2011-06-12:/blog/2011/06/analyzing-facebook-webchat-sessions-with-bro/</id>
    <title type="html">Analyzing Facebook Webchat Sessions with Bro</title>
    <published>2011-06-12T07:00:00Z</published>
    <updated>2011-11-03T07:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/mUDtccHYc8A/" />
    <content type="html">&lt;p&gt;The Facebook webchat allows you to chat with your friends while having a
Facebook window open in the browser. In this post, I describe how the webchat
protocol works and show how to write a &lt;a href="http://www.bro-ids.org"&gt;Bro&lt;/a&gt; script
that analyzes chat sessions.&lt;/p&gt;

&lt;h1 id="the-facebook-webchat-protocol"&gt;The Facebook Webchat Protocol&lt;/h1&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2011/06/fb-icon.png" alt="Facebook" style="float: right; margin-left: 10px; margin-bottom: 10px" align="right"&gt;

&lt;p&gt;As a traffic analyst, you might wonder how one can get insight into the
webchat protocol details and how to work with it at a higher level. After all,
who wants to write boilerplate code whose only purpose is to fight the
&lt;em&gt;representation of the data&lt;/em&gt; rather than analyzing the data itself? Let us
extract messages from a Facebook webchat conversation and put them into
Bro data structures where they are easy to manipulate, print, and react upon.
This involves parsing the JSON objects, which look like this:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(;;);{&lt;/span&gt;&lt;span class="s2"&gt;"t"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"c"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"p_100002331422524"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"s"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"ms"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="s2"&gt;"window_id"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1985081376&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"unfocus_chat"&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The above example is an &lt;code&gt;unfocus_chat&lt;/code&gt; event sent over the AJAX channel,
indicating that the user placed the focus somewhere else on the page, away from
the chat window. Here is another HTTP body:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(;;);{&lt;/span&gt;&lt;span class="s2"&gt;"t"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"c"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"p_100002331422524"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"s"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"ms"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="s2"&gt;"msg"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"So I need the URL, dude.  What is it?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1303218454567&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"clientTime"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1303218453582&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"msgID"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2755876075"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="s2"&gt;"from"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100002331422524&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"to"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100002297942500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"from_name"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Mondo Cheeze"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"from_first_name"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Mondo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"from_gender"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"to_name"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Udder Kaos"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"to_first_name"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Udder"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"to_gender"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"msg"&lt;/span&gt;
&lt;span class="p"&gt;}]}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This one is an actual chat message. The nice thing is that such messages are
self-contained and include quite some meta information: contents, timestamps,
names, unique IDs, and even genders.&lt;/p&gt;

&lt;h1 id="the-bro-script"&gt;The Bro Script&lt;/h1&gt;

&lt;p&gt;Alas, the HTTP body is a big fat opaque string with no structure and parsing
nested data in strings with just regular expression is clunky at best. (For
those who have spare cycles: an extremely useful project would be to create a
Bro analyzer that exposes first-class types of the DOM tree of a document and
script-level primitives for basic JavaScript analysis.) A crude way to do this
is splitting the string on &lt;code&gt;,"&lt;/code&gt;, then finding the right key-value pairs, and
populating the following Bro data structures with the parsed data:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;chat_message&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;record&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;# Message timestamp.&lt;/span&gt;
    &lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;# Name of the sender&lt;/span&gt;
    &lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="c1"&gt;# Name of the recipient.&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;# The actual message.&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;chat_session&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;record&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;time&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;        &lt;span class="c1"&gt;# Unix timestamp of first message.&lt;/span&gt;
    &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;time&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;# Unix timestamp of last message.&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;count&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;           &lt;span class="c1"&gt;# Total number of messages in session.&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;At this point it is easy to generate NOTICES with chat messages, look for
suspicious messages, and more generally, leverage Bro&amp;#x2019;s full scripting language
more effectively. I wrote a basic script that uses these data types to dump a
chat session between two buddies. You can &lt;a href="https://github.com/mavam/brospects/facebook.bro"&gt;download it&lt;/a&gt; from my
&lt;a href="https://github.com/mavam/brospects"&gt;brospects&lt;/a&gt;  repository, which contains a mixed bag of experimental
Bro scripts that are either in preparation for or don&amp;#x2019;t make it into the
&lt;a href="http://git.bro-ids.org/bro-scripts.git"&gt;official Bro scripts repository&lt;/a&gt;. Here
is the output of &lt;code&gt;facebook.bro&lt;/code&gt; of a sample Facebook webchat session:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;1303218454567 (Mondo Cheeze - Udder Kaos) So I need the URL, dude.  What is it?
1303218465938 (Udder Kaos - Mondo Cheeze) the URL?
1303218474259 (Mondo Cheeze - Udder Kaos) Yeah for the secret image
1303218481721 (Udder Kaos - Mondo Cheeze) ok lemme see
1303218495626 (Mondo Cheeze - Udder Kaos) Someone could be sniffing this conversation, be sure to send it safely
1303218503972 (Udder Kaos - Mondo Cheeze) ?
1303218570782 (Mondo Cheeze - Udder Kaos) Cmon we talked about this.  Encrypt it with WonderCipher-92 \
                                           and send me the Base64 encoding of the hex.  Usual key.
1303218587568 (Udder Kaos - Mondo Cheeze) 'k.  So here it is:
1303218595067 (Udder Kaos - Mondo Cheeze) NmQwMDJjZDdhZTdlYmYxNTc5MGVjZDc1YTYxNDk1OGE0ZTRhYjAzOTVi
1303218618252 (Mondo Cheeze - Udder Kaos) What's the IV
1303218624712 (Udder Kaos - Mondo Cheeze) huh?
1303218637197 (Mondo Cheeze - Udder Kaos) Initialization vector, you maroon.  WC-92 is a stream cipher, you know
1303218667601 (Udder Kaos - Mondo Cheeze) oh yeah.  I used my birthday, all as one number.
1303218685436 (Udder Kaos - Mondo Cheeze) you *do* remember it, right?
1303218700515 (Mondo Cheeze - Udder Kaos) yeah your an April Fool, not hard to remember
1303218710402 (Udder Kaos - Mondo Cheeze) heh
1303218718486 (Mondo Cheeze - Udder Kaos) K gimme a sec to decrypt then.
1303218733463 (Mondo Cheeze - Udder Kaos) Hey idiot this isn't the secret, it's Google's home page.
1303218745028 (Udder Kaos - Mondo Cheeze) whoops hang on, blew my cutpaste
1303218767633 (Udder Kaos - Mondo Cheeze) okay, here's the right one:
1303218776922 (Udder Kaos - Mondo Cheeze) NmQwMDJjZDdhZTdlYmYwMDY3MGRjZDdlYjA1NDlhODQ0ZjA1YmEyNDRm
1303218800303 (Mondo Cheeze - Udder Kaos) And?
1303218807537 (Udder Kaos - Mondo Cheeze) and what
1303218815022 (Mondo Cheeze - Udder Kaos) What's the IV
1303218824330 (Udder Kaos - Mondo Cheeze) huh?
1303218839537 (Mondo Cheeze - Udder Kaos) yo maroon same thing as we just discussed a moment ago, sheesh
1303218855518 (Udder Kaos - Mondo Cheeze) oh that yeah like I said my birthday
1303218869728 (Mondo Cheeze - Udder Kaos) You used the same IV as before????
1303218889893 (Udder Kaos - Mondo Cheeze) right, otherwise how would I remember it?
1303218900257 (Mondo Cheeze - Udder Kaos) YOU BOZO
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In summary, this is an example of how to translate low-level representation of
communication into higher abstractions that are easier to work with. After
creating first-class Bro types for the involved entities, one can now leverage
the real power of Bro&amp;#x2019;s scripting language.&lt;/p&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/mUDtccHYc8A" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;The Facebook webchat allows you to chat with your friends while having a
Facebook window open in the browser. In this post, I describe how the webchat
protocol works and show how to write a &lt;a href="http://www.bro-ids.org"&gt;Bro&lt;/a&gt; script
that analyzes chat sessions.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2011/06/analyzing-facebook-webchat-sessions-with-bro/</feedburner:origLink></entry>
  <entry>
    <id>tag:matthias.vallentin.net,2010-10-29:/blog/2010/10/taming-the-sheep-detecting-sidejacking-with-bro/</id>
    <title type="html">Taming the Sheep: Detecting Sidejacking with Bro</title>
    <published>2010-10-29T07:00:00Z</published>
    <updated>2011-06-22T07:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/CslmEjr-AUg/" />
    <content type="html">&lt;p class="notice"&gt;&lt;em class="caption"&gt;Update&lt;/em&gt;
&lt;em class="date"&gt;June 22, 2011&lt;/em&gt;
You might also want to read the follow-up article about &lt;a href="http://matthias.vallentin.net/blog/2011/06/better-nids-performance-by-tracking-roaming-users/"&gt;reducing false
positives by incorporating ARP
traffic&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The Firefox extension &lt;a href="http://codebutler.com/firesheep"&gt;Firesheep&lt;/a&gt; brings
&lt;a href="http://erratasec.blogspot.com/2007/08/sidejacking-with-hamster_05.html"&gt;sidejacking&lt;/a&gt; (or &lt;a href="http://fscked.org/projects/cookiemonster"&gt;cookie hijacking&lt;/a&gt;) to the masses.
Its convenient point-and-click interface allows even unskilled users to hijack
their friend&amp;#x2019;s Facebook and Twitter accounts from a public wireless network,
such as a coffee shop or at the airport.&lt;/p&gt;

&lt;p&gt;This is a relevant issue because most popular Web 2.0 services only use TLS for
the initial authentication and then drop back to unencrypted HTTP throughout
the rest of communication. As a consequence, the session cookies fly around in
clear text and are easily captured.&lt;/p&gt;

&lt;h1 id="detection"&gt;Detection&lt;/h1&gt;

&lt;p&gt;As network operators and security analysts, we would like to know when session
hijacking occurs. In the following, I present the design and implementation of
a sidejacking detector that I wrote for the &lt;a href="http://www.bro-ids.org"&gt;Bro&lt;/a&gt;
intrusion detection system.&lt;/p&gt;

&lt;h2 id="design-considerations"&gt;Design Considerations&lt;/h2&gt;

&lt;p&gt;The detection strategy depends on the network topology and available context.
One problem is how to define a user. In flat IP address space, it makes sense
to identify a user by its IP address, but this would fail in a NAT environment,
where a single IP accomodates several users. For NAT scenarios, one could
define a user as a pair of IP address and &lt;code&gt;User-Agent&lt;/code&gt; header. For large
NATs or NATs with identically configured machines, this latter notion of user
could be prone to false positivies, but this trade-off has to be faced.
Therefore, the notion of what constitutes a user can be configured in the
detector.&lt;/p&gt;

&lt;p&gt;Another issue poses the presence of absence of DHCP context. Consider a hotspot
or coffeeshop network operator with a private IP space. If Bro can see DHCP
traffic, we can crisply identify a user by its MAC address and do not have
to resort to high-level notions, such as provided by HTTP headers. In a static
setup, however, this context will likely not be available. At the same time,
if a different address reuses a session cookie in a static IP topology, we
probably observed a sidejacking attack. Furthermore, if the same address uses
the same sesssion cookie from a different user agent, we report such activity.&lt;/p&gt;

&lt;p&gt;There is another subtlety: the cookie header consists of a list of key-value
pairs, yet only a subset of those represent a user session while the others
could be random. Simply comparing the entire cookie string against a value seen
in the past would thus be prone to false negatives. Hence we have to restrict
ourselves to the relevant session-identifying subset. In fact, this is how
Firesheeps works: it ships with site-specific handlers that define the cookie
keys to extract and then sets only those to impersonate a user. This motivates
the following design: if a specification for a particular service is available,
restrict the cookie to the relevant fields, and otherwise use the cookie as a
whole. The default set of known services that ships with the detector is based
on all handlers Firesheep currently implements.&lt;/p&gt;

&lt;h2 id="decision-trees"&gt;Decision Trees&lt;/h2&gt;

&lt;p&gt;Given these considerations, how should the detector operate? Clearly, we need a
flexible mechanism that can be configured to meet the specific conditions.
Let us distinguish the two notions of users. If we have the opaque NAT setting,
the detector operats according to the Figure below.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2010/09/user_id.png" alt="User identification purely based on IP address" style="display: block; margin: 0 auto;"&gt;

&lt;p&gt;Updating the cookie context is not really an actionable item but means that the
per-cookie state (such as when it was last seen) is being updated. In
comparsion, the decision tree for a flat address space looks a little more
elaborate.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2010/09/user_id_ext.png" alt="User identification based on IP address, User-Agent, and available DHCP context" style="display: block; margin: 0 auto;"&gt;

&lt;p&gt;Note that session cookie reuse is only reported when the user is defined as IP
address.&lt;/p&gt;

&lt;h2 id="operation"&gt;Operation&lt;/h2&gt;

&lt;p&gt;Simply using the sidejacking detector is straight-forward and does not require
detailed knowledge of Bro. Since it is implemented as a Bro script, running it
only requires passing the filename of the script on the command line:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="gp"&gt;&lt;/span&gt; bro -i eth0 sidejack&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When Bro detects session hijacking, it will report a &lt;code&gt;Sidejacking&lt;/code&gt; notice:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;TIMESTAMP Sidejacking ATTACKER ADDRESS (ATTACKER HTTP SESSION ID) @ 'ATTACKER
USER_AGENT' hijacked SERVICE session (VICTIM HTTP SESSION) of VICTIM ADDRESS @
'VICTIM USER AGENT' last seen at SESSION COOKIE LAST SEEN via cookie SESSION
COOKIE&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Similarly, session cookie reuse is reported as &lt;code&gt;SessionCookieReuse&lt;/code&gt; and
has the following format:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ATTACKER ADDRESS (ATTACKER HTTP SESSION ID) reused SERVICE session VICTIM
SESSION ID in user agent 'ATTACKER USER AGENT', where previous user agent was
'VICTIM USER AGENT' and last seen at SESSION COOKIE LAST SEEN via cookie
SESSION COOKIE&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Extending the detector with a new site profile is also not too complicated. I
would recommend to write a new Bro script and leaves the original one
unmodified. Similar to writing a new Firesheep handler, a corresponding
detector for a fictive service Foo at foo.com where a user session comprises
the two cookies &lt;code&gt;session_id&lt;/code&gt; and &lt;code&gt;auth_token&lt;/code&gt; would look as follows:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="cp"&gt;@load sidejack&lt;/span&gt;

&lt;span class="k"&gt;redef&lt;/span&gt; &lt;span class="n"&gt;HTTP&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cookie_list&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Foo"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[$&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sr"&gt;/foo.com/&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="p"&gt;$&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kt"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"session_id"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"auth_token"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If the cookie keys are dynamically generated, as it is the case with Wordpress,
one can also specify a regular expression instead:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;redef&lt;/span&gt; &lt;span class="n"&gt;HTTP&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cookie_list&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Foo"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[$&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sr"&gt;/foo.com/&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="p"&gt;$&lt;/span&gt;&lt;span class="n"&gt;pat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sr"&gt;/session_[0-9A-F]+/&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Telling the detector to use DHCP involves mereley flipping a switch:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;redef&lt;/span&gt; &lt;span class="n"&gt;HTTP&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;use_dhcp_aliases&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Similarly, changing the notion of a user to include the user agent:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;redef&lt;/span&gt; &lt;span class="n"&gt;HTTP&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;user_is_ip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;F&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To extend the analysis to all cookies and not only the listed known service
profiles:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;redef&lt;/span&gt; &lt;span class="n"&gt;HTTP&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;known_services_only&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;F&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Finally, it is possible to tweak the cookie expiration interval, i.e., the time
how long Bro remembers a cookie session. The example below tells Bro to keep
cookies for 10 minutes only.&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;redef&lt;/span&gt; &lt;span class="n"&gt;HTTP&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cookie_expiration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ld"&gt;10 min&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If an attacker uses the cookie after the expiration interval, Bro will not
detect it. Naturally, there is a trade-off between accumulating state in memory
and opportunity for evasion.&lt;/p&gt;

&lt;h1 id="mitigation"&gt;Mitigation&lt;/h1&gt;

&lt;p&gt;By weaponizing the general public, Firesheep will hopefully push vendors to
switch TLS-only deployments more rapidly. During this painful migration period,
techniques like &lt;a href="http://en.wikipedia.org/wiki/Strict_Transport_Security"&gt;HTTP Strict Transport Security&lt;/a&gt; (HSTS) could prove
useful. A HSTS-enabled server sends a &lt;code&gt;Strict-Transport-Security&lt;/code&gt; header to
the client that specifies a time period during which the client should use
encryption only. That is, the server upgrades the connection if the client
tries to use unencrypted HTTP.&lt;/p&gt;

&lt;h1 id="bro-code"&gt;Bro Code&lt;/h1&gt;

&lt;p&gt;Practical security is all about raising the bar. Hopefully this detector allows
network operators to pinpoint the use of sidejacking attacks, such as
facilitated by Firesheep, and take accountable action. The detector is part of
&lt;a href="http://git.bro-ids.org/bro-scripts.git"&gt;Bro script repository&lt;/a&gt;
which we created as an official place for user-contributed scripts. The latest
version of the script can be &lt;a href="http://git.bro-ids.org/bro-scripts.git/blob_plain/HEAD:/sidejack.bro"&gt;downloaded here&lt;/a&gt;.&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/CslmEjr-AUg" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;The Firefox extension &lt;a href="http://codebutler.com/firesheep"&gt;Firesheep&lt;/a&gt; brings
&lt;a href="http://erratasec.blogspot.com/2007/08/sidejacking-with-hamster_05.html"&gt;sidejacking&lt;/a&gt; (or &lt;a href="http://fscked.org/projects/cookiemonster"&gt;cookie hijacking&lt;/a&gt;) to the masses.
Its convenient point-and-click interface allows even unskilled users to hijack
their friend’s Facebook and Twitter accounts from a public wireless network,
such as a coffee shop or at the airport.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2010/10/taming-the-sheep-detecting-sidejacking-with-bro/</feedburner:origLink></entry>
  <entry>
    <id>tag:matthias.vallentin.net,2010-10-07:/blog/2010/10/probability-and-statistics-cheat-sheet/</id>
    <title type="html">Probability and Statistics Cheat Sheet</title>
    <published>2010-10-07T07:00:00Z</published>
    <updated>2011-06-05T07:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/FyORQtSd0cg/" />
    <content type="html">&lt;p class="notice"&gt;&lt;em class="caption"&gt;Update&lt;/em&gt;
&lt;em class="date"&gt;June 5, 2011&lt;/em&gt;
The probability and statistics cheat sheet now has it&amp;#x2019;s &lt;a href="http://matthias.vallentin.net/probability-and-statistics-cookbook"&gt;own
page&lt;/a&gt;. Furthermore, I decided to rename
it to &lt;em&gt;cookbook&lt;/em&gt; because the sheer number of pages &lt;a href="http://www.johndcook.com/blog/2010/10/04/probability-and-statistics-cheat-sheet/"&gt;stretches the definition of
&lt;em&gt;cheat sheet&lt;/em&gt; quite a bit&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;While I started to explore the many univariate probability distribution
relationships last year, I created a PDF to help me remember them. This
document quickly grew into a basic probability theory reference and its
substantial parts evolved while I was taking the excellent course STAT 200B at
&lt;a href="http://www.stat.berkeley.edu"&gt;UC Berkeley&lt;/a&gt; taught by &lt;a href="http://www.stat.berkeley.edu/~cgk"&gt;Cari
Kaufman&lt;/a&gt;. Today, it is more like a cookbook
and helps me to swap back in the relevant context when facing statistical
challenges.&lt;/p&gt;

&lt;p&gt;The purpose of this cookbook is to bundle essential statistical knowledge with
a unified notation. If you find any mistakes or have suggestions for further
topics, I&amp;#x2019;d appreciate if you contact me. I will continue to add material that
I deem useful.&lt;/p&gt;

&lt;p class="center"&gt;&lt;a href="http://matthias.vallentin.net/probability-and-statistics-cookbook"&gt;&lt;strong&gt;Download&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1 id="screenshots"&gt;Screenshots&lt;/h1&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2010/10/dist-disc.png" alt="Discrete Distributions" style="display: block; margin: 0 auto;"&gt;
&lt;img src="http://matthias.vallentin.net/blog/2010/10/dist-cont.png" alt="Continuous Distributions" style="display: block; margin: 0 auto;"&gt;&lt;/p&gt;

&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/FyORQtSd0cg" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;While I started to explore the many univariate probability distribution
relationships last year, I created a PDF to help me remember them. This
document quickly grew into a basic probability theory reference and its
substantial parts evolved while I was taking the excellent course STAT 200B at
&lt;a href="http://www.stat.berkeley.edu"&gt;UC Berkeley&lt;/a&gt; taught by &lt;a href="http://www.stat.berkeley.edu/~cgk"&gt;Cari
Kaufman&lt;/a&gt;. Today, it is more like a cookbook
and helps me to swap back in the relevant context when facing statistical
challenges.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2010/10/probability-and-statistics-cheat-sheet/</feedburner:origLink></entry>
  <entry>
    <id>tag:matthias.vallentin.net,2010-03-30:/blog/2010/03/should-you-trust-your-ssl-certificate/</id>
    <title type="html">Should You Trust Your SSL Certificate?</title>
    <published>2010-03-30T07:00:00Z</published>
    <updated>2011-03-28T07:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/WkC8i_SVLvQ/" />
    <content type="html">&lt;p&gt;Is your SSL secure? I&amp;#x2019;m not sure that the answer is quite so binary after
reading the paper &lt;a href="http://files.cloudprivacy.net/ssl-mitm.pdf"&gt;Certified Lies: Detecting and Defeating Government
Interception Attacks Against SSL&lt;/a&gt; by &lt;a href="http://www.dubfire.net"&gt;Christopher
Soghoian&lt;/a&gt; and &lt;a href="http://www.sidstamm.com"&gt;Side Stamm&lt;/a&gt;. In
a so-called &lt;em&gt;compelled certificate creation attack&lt;/em&gt;, a government agency forces
a CA to issue a fake certificate to intercept an SSL-encrypted (or more
precisly: TLS-encrypted) session without triggering a warning in the victim&amp;#x2019;s
browser. This works seamlessly, because it does not matter who issued a
specific certificate as long as the browser sees a valid chain of trust
terminated with a known root certificate. Although the authors claim that this
attack is exercised in practice, data-driven evidence is yet lacking.&lt;/p&gt;

&lt;h1 id="proactive-protection"&gt;Proactive Protection&lt;/h1&gt;

&lt;p&gt;In order to assess whether or not one&amp;#x2019;s current TLS session is being
intercepted, the authors have developed a Firefox extension called CertLock,
which extends the browser history by collecting additional certificate
information such as its hash, the name and country of the issuing CA and the
website, and the trust chain up to the root CA. Each time the user revisits a
TLS-protected site for which a certificate in the history exists, CertLock
compares their two hash values. If a mismatch is detected, CertLock next
compares the issuing CAs&amp;#x2019; country. In the event that they differ, CertLock
presents a warning to the user; otherwise the page loads without any warnings. &lt;/p&gt;

&lt;p&gt;Although CertLock alleviates some problems with the compelled certificate
creation attack, it is still limited in the following ways.&lt;/p&gt;

&lt;h2 id="trust-on-first-use"&gt;Trust-On-First-Use&lt;/h2&gt;

&lt;p&gt;When visiting a TLS-protected website for the first time, the user immediately
faces a dilemma: how to trust the certificate chain when there is nothing to
compare to. Because CertLock operates by comparing the current certificate with
one from the history, it cannot detect if the first encountered certificate
chain is authentic. That is, CertLock can only protect the user for sites that
have already been visited in the past &lt;em&gt;and&lt;/em&gt; deemed secure.&lt;/p&gt;

&lt;p&gt;When planning a trip to a potentially hostile environment (e.g., &lt;a href="http://www.wallofsheep.com"&gt;DEF
CON&lt;/a&gt; or China), a common &lt;a href="http://nweaver.blogspot.com/2009/07/protocol-for-visiting-china-or-defcon.html"&gt;recommendation&lt;/a&gt; is to use a fresh laptop, or at least to replace the laptop&amp;#x2019;s harddrive
and start over with a fresh system installation. This is precisly the scenario
where CertLock would be necessary, but cannot function due to the lack of
browsing history.&lt;/p&gt;

&lt;h2 id="ground-truth"&gt;Ground Truth&lt;/h2&gt;

&lt;p&gt;A well-known issue with anomaly detection is the presence of an
adversary during the training phase, who can later conduct unnoticed attacks
when the system is in live operation. CertLock cannot distinguish counterfeit
from real certificate trust chains when trained maliciously. In fact, the
opposite of the intended behavior may occur: a benign certificate may be
misclassified as untrustworthy, and a forged certificate may be blindly
accepted. This problem highlights the need for a trustworthy past certificate
history, otherwise it is impossible to make an accurate decision.&lt;/p&gt;

&lt;h2 id="false-negatives"&gt;False Negatives&lt;/h2&gt;

&lt;p&gt;CertLock suffers from false negatives when &lt;em&gt;(i)&lt;/em&gt; the actual and compelled CAs
are from the same country and &lt;em&gt;(ii)&lt;/em&gt; the certificate differs from the one in
the history but the issuing CA has not changed. At the same time, the number of
false positives is greatly reduced this way, which is vital to get the user&amp;#x2019;s
attention in this scenario.&lt;/p&gt;

&lt;h2 id="user-failure"&gt;User Failure&lt;/h2&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2010/05/certlock.png" alt="CertLock warning" style="float: right; margin-left: 10px; margin-bottom: 10px" align="right"&gt;
When I asked two of my friends what their impression was of CertLock&amp;#x2019;s displayed
warning, I was surprised to hear &amp;#x201C;it looks like spam - what does Russia have to
do with the site I am visiting?&amp;#x201D; Granted, the authors acknowledge the room for
improvement on the user interface, and their intention to brush up the warning&amp;#x2019;s
design. However, given the population&amp;#x2019;s
&lt;a href="http://www.youtube.com/watch?v=uEP7uti0PDw"&gt;limited&lt;/a&gt;
&lt;a href="http://www.youtube.com/watch?v=NAWe7Xk5e4A"&gt;geographic&lt;/a&gt;
&lt;a href="http://www.youtube.com/watch?v=lj3iNxZ8Dww"&gt;understanding&lt;/a&gt;
of international relations, this may be a challenging task.&lt;/p&gt;

&lt;h1 id="retrospective-analysis"&gt;Retrospective Analysis&lt;/h1&gt;

&lt;p&gt;In addition to the aspects above which highlight the need for better proactive
defenses, opportunities for forensic analysis are equally important to
appreciate. Network traffic traces may contain evidence of the compelled
certificate creation attack. &lt;a href="http://www.bro-ids.org"&gt;Bro&lt;/a&gt; already supports
&lt;a href="http://taosecurity.blogspot.com/2009/03/bro-ssl-certificate-details.html"&gt;extraction of certificates&lt;/a&gt; which minimizes the programmatic
efforts required to implement a detector. To address the problem of ground
truth, the traces should ideally span long time frames and be recorded from
multiple vantage points. In the future, I plan to write a Bro script to detect
this type of TLS tampering, which could be particularly useful to unveil
targeted attacks, as these tend to live under the radar of the standard network
intrusion detection system.&lt;/p&gt;

&lt;p&gt;&lt;em class="caption"&gt;Update&lt;/em&gt; &lt;em class="date"&gt;March 3, 2011&lt;/em&gt;
It is great to see some momentum in the community regarding this topic. Erik
Hjelmvik blogs about the &lt;a href="http://www.netresec.com/?page=Blog&amp;amp;month=2011-03&amp;amp;post=Network-Forensic-Analysis-of-SSL-MITM-Attacks"&gt;forensic analysis of TLS certificates&lt;/a&gt; to
detect man-in-the-middle attacks.  To extract certifcate files, he suggests
NetworkMiner, a GUI tool for Windows. &lt;/p&gt;

&lt;p&gt;Well, Bro&amp;#x2019;s TLS analyzer also provides this basic extraction functionality, yet
more beyond that. For example, here&amp;#x2019;s a subset of events the TLS analyzer
generates:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;ssl_conn_weak(name: string, c: connection)&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;ssl_certificate_seen(c: connection, is_server: bool)&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;ssl_certificate(c: connection, cert: X509, is_server: bool)&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;ssl_conn_attempt(c: connection, version: count, 
      ciphers: cipher_suites_list)&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;ssl_conn_server_reply(c: connection, version: count, 
      ciphers: cipher_suites_list)&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;ssl_conn_established(c: connection, version: count, cipher_suite: count)&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;process_X509_extensions(c: connection, ex: X509_extension)&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;ssl_session_insertion(c: connection, id: SSL_sessionID)&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;ssl_conn_reused(c: connection, session_id: SSL_sessionID)&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;ssl_X509_error(c: connection, err: int, err_string: string)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each time Bro raises one of these events, user-defined handler code executes.
This allows for a very fine-grained analysis of TLS traffic and certificates.
Such a high-level abstraction of activity exists for a numerous network
protocols, which makes Bro &amp;#x201C;the Python/Ruby&amp;#x201D; for traffic analysis. That said,
there are plenty of domain-specific primitives available in Bro to effectively
write a detector for the compelled certificate attack.&lt;/p&gt;

&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/WkC8i_SVLvQ" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;Is your SSL secure? I’m not sure that the answer is quite so binary after
reading the paper &lt;a href="http://files.cloudprivacy.net/ssl-mitm.pdf"&gt;Certified Lies: Detecting and Defeating Government
Interception Attacks Against SSL&lt;/a&gt; by &lt;a href="http://www.dubfire.net"&gt;Christopher
Soghoian&lt;/a&gt; and &lt;a href="http://www.sidstamm.com"&gt;Side Stamm&lt;/a&gt;. In
a so-called &lt;em&gt;compelled certificate creation attack&lt;/em&gt;, a government agency forces
a CA to issue a fake certificate to intercept an SSL-encrypted (or more
precisly: TLS-encrypted) session without triggering a warning in the victim’s
browser. This works seamlessly, because it does not matter who issued a
specific certificate as long as the browser sees a valid chain of trust
terminated with a known root certificate. Although the authors claim that this
attack is exercised in practice, data-driven evidence is yet lacking.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2010/03/should-you-trust-your-ssl-certificate/</feedburner:origLink></entry>
  <entry>
    <id>tag:matthias.vallentin.net,2009-08-16:/blog/2009/08/email-attachment-processing-with-bro/</id>
    <title type="html">Email Attachment Processing with Bro</title>
    <published>2009-08-16T07:00:00Z</published>
    <updated>2011-04-13T07:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/GEFkzsIXsSw/" />
    <content type="html">&lt;p&gt;Malware that spreads via email is nothing new. Particularly
&lt;a href="http://www.scribd.com/doc/13731776/Tracking-GhostNet-Investigating-a-Cyber-Espionage-Network"&gt;targeted&lt;/a&gt; &lt;a href="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-746.html"&gt;attacks&lt;/a&gt; against politically sensitive
institutions or individuals consist of well socially engineered mails and often
ship with custom &lt;a href="http://www.defcon.org/html/defcon-17/dc-17-speakers.html#Richard"&gt;0-day malware&lt;/a&gt; in the form of email attachments. In
order to extract such malicious attachments, I wrote a
&lt;a href="http://www.bro-ids.org"&gt;Bro&lt;/a&gt; policy script which records suspicious
attachments to disk for later analysis. A possible application scenario would
be to scan office documents for malicious JavaScript or executables for
viruses. Another option would be &lt;a href="http://wiki.github.com/sethhall/bro_scripts/the-malware-hash-registry-and-bro-ids"&gt;hashing the attachment directly in
Bro&lt;/a&gt; and comparing it against a publicly available registry, such
as Seth Hall illustrates for HTTP traffic.&lt;/p&gt;

&lt;p&gt;The script to extract attachments works by registering a callback handler for
the &lt;code&gt;Content-Type&lt;/code&gt; header in an SMTP session. Then both MIME type and the
name of the attachment is examined. If either looks suspicious, Bro generates a
&lt;code&gt;SensitiveMIMEType&lt;/code&gt; or &lt;code&gt;SensitiveExtension&lt;/code&gt;
&lt;a href="http://blog.icir.org/2008/03/telling-bro-what-important.html"&gt;NOTICE&lt;/a&gt;.
The user can customize the the analyzer behavior in many ways. To change the
directory where the attachments are stored on disk, one can redefine the
&lt;code&gt;attachment_dir&lt;/code&gt; variable:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;redef&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;attachment_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"foo"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The script stores the attachments by default, but this behavior can easily
changed via:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="c1"&gt;# Whether attachments with sensitive MIME types should be stored.&lt;/span&gt;
&lt;span class="k"&gt;redef&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;store_sensitive_mime_types&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;F&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;# Whether attachments with sensitive file extensions should be stored.&lt;/span&gt;
&lt;span class="k"&gt;redef&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;store_sensitive_extensions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;F&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It is also possible to restrict or extend the regular expression used to
determine whether an attachment is sensitive or not:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="c1"&gt;# Deem only application\/octet-stream as suspicious.&lt;/span&gt;
&lt;span class="k"&gt;redef&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sensitive_mime_types&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/application\/octet-stream/&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;# Restrict sensitive extensions to office documents and executables.&lt;/span&gt;
&lt;span class="k"&gt;redef&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="nn"&gt;::&lt;/span&gt;&lt;span class="n"&gt;sensitive_extensions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="sr"&gt;/[pP][dD][fF]$/&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="sr"&gt;/[dD][oO][cC][xX]?$/&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="sr"&gt;/[xX][lL][sS]$/&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="sr"&gt;/[pP][pP][sStT]$/&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="sr"&gt;/[eE][xX][eE]$/&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="sr"&gt;/[cC][oO][mM]$/&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="sr"&gt;/[bB][aA][tT]$/&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The script generates a file of the form &lt;code&gt;ID-filename&lt;/code&gt; where &lt;code&gt;ID&lt;/code&gt; is a unique
attachment ID that is monotonically increasing and &lt;code&gt;filename&lt;/code&gt; is the name of
the attachment or just the MIME type if the attachment does not have a name.&lt;/p&gt;

&lt;p&gt;The script is part of the &lt;a href="http://git.bro-ids.org/bro-scripts.git"&gt;Bro scripts&lt;/a&gt;
git repository where you can always &lt;a href="http://git.bro-ids.org/bro-scripts.git/blob_plain/HEAD:/mime-attachment.bro"&gt;download the most recent version&lt;/a&gt;.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/GEFkzsIXsSw" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;Malware that spreads via email is nothing new. Particularly
&lt;a href="http://www.scribd.com/doc/13731776/Tracking-GhostNet-Investigating-a-Cyber-Espionage-Network"&gt;targeted&lt;/a&gt; &lt;a href="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-746.html"&gt;attacks&lt;/a&gt; against politically sensitive
institutions or individuals consist of well socially engineered mails and often
ship with custom &lt;a href="http://www.defcon.org/html/defcon-17/dc-17-speakers.html#Richard"&gt;0-day malware&lt;/a&gt; in the form of email attachments. In
order to extract such malicious attachments, I wrote a
&lt;a href="http://www.bro-ids.org"&gt;Bro&lt;/a&gt; policy script which records suspicious
attachments to disk for later analysis. A possible application scenario would
be to scan office documents for malicious JavaScript or executables for
viruses. Another option would be &lt;a href="http://wiki.github.com/sethhall/bro_scripts/the-malware-hash-registry-and-bro-ids"&gt;hashing the attachment directly in
Bro&lt;/a&gt; and comparing it against a publicly available registry, such
as Seth Hall illustrates for HTTP traffic.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2009/08/email-attachment-processing-with-bro/</feedburner:origLink></entry>
  <entry>
    <id>tag:matthias.vallentin.net,2008-05-10:/blog/2008/05/the-doom-of-client-side-wireless-network-security/</id>
    <title type="html">The Doom of Client-Side Wireless Network Security</title>
    <published>2008-05-10T07:00:00Z</published>
    <updated>2008-05-10T07:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/_o5Q4pANEbs/" />
    <content type="html">&lt;p&gt;HD Moore &lt;a href="http://hamsterswheel.com/techblog/?p=55"&gt;recently announced&lt;/a&gt; the
integration of the &lt;a href="http://www.theta44.org/karma"&gt;KARMA&lt;/a&gt; tools with the
&lt;a href="http://www.metasploit.com"&gt;metasploit&lt;/a&gt; framework. The implications of this
fusion are devastating. In an interview with Patrick Gray, HD presents the new
powerful capabilities that take client-side wireless exploitation to a new
level. Technically, HD rewrote parts of the original KARMA driver, included 
&lt;a href="http://digininja.org"&gt;some&lt;/a&gt;
&lt;a href="http://blog.metasploit.com/2008/02/rise-security-vs-asus-eee-pc.html"&gt;patches&lt;/a&gt;,
and integrated the KARMA user-land daemons into the metasploit framework.&lt;/p&gt;

&lt;p&gt;To illustrate the new potent features of metasploit, consider the following
scenario. A user opens his laptop on the plane to watch a DVD. If he ever
connected to an insecure access point, it will be in his list of list of
preferred wireless networks. Since the operating system attempts to connect to
all known wireless networks at boot time or when waking up from hibernation, it
sends out probes to look for known networks. An attacker, a couple of rows
behind, responds to the probes, provides an IP address to victim by DHCP and is
now rigged up to launch a multitude of client-side attacks.&lt;/p&gt;

&lt;p&gt;Unaware of being owned, the victim&amp;#x2019;s mail client periodically tries to re-send
emails laying around in the outbox. The DNS request for the SMTP server is
intercepted by the attacker who returns his own address. Further, he mimics the
entire SMTP connection handshake when the victim connects. Thus the victim
sends his emails directly to the attacker through a fake SMTP channel.
This scenario extends of course to any other plain-text protocol (HTTP, FTP,
POP3, etc.). Clearly, the dominant position of the attacker yields
ample opportunity for more sophisticated client-side wireless attacks, as the
next examples by HD show.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Massive cookie stealing&lt;/strong&gt;.  Traditional &lt;a href="http://community.corest.com/~hochoa/wifizoo/index.html"&gt;cookie
stealing&lt;/a&gt; presupposes
that the victim &lt;em&gt;actively&lt;/em&gt; transmits a cookie from a particular web site in
order to be captured by the attacker. In contrast, this attack only requires
a single HTTP request to originate from the victim to hijack &lt;em&gt;all&lt;/em&gt; cookies
from the victim&amp;#x2019;s browser. In general, only the requested site is allowed to
read that particular cookie. With a malicous server responding to all client
request, the attacker can bypass this restriction. When a victim sends a HTTP
request, the attacker returns a chosen list of web sites (say the current top
500 sites) and the browser then tries to connect to each site with the
corresponding cookie. Because all sites resolve back to the same attacker&amp;#x2019;s
hostname, all cookies arrive in the hands of the attacker. Thus, by merely
trying to access an arbitrary page in the Internet, the victim exposed all
his cookies that correspond to entry in the attacker&amp;#x2019;s list of sites.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Browser credential theft&lt;/strong&gt;. The next interesting step from the attacker&amp;#x2019;s
perspective is it to hunt for usernames and passwords. To this end, HD wrote
a little script storing all form information of the top 500 websites, e.g.
forms asking for personal data, SSN, bank account number, MySpace and
Facebook logins, and so on. When the victim visits &lt;em&gt;any&lt;/em&gt; arbitrary website,
the attackers returns a page full of frames that open the pages from the list
and contain the saved form snippets. If the victim enabled automatic form
fill-out in his browser preferences, the forms are auto-populated with
sensitive user data. In addition to the form snippets the attacker delivers a
malicious piece of JavaScript that grabs the form contents after filled out
by the browser and sends them back to the attacker.  Hence a single page
visit results in a complete compromise of the victim&amp;#x2019;s login credentials and
personal data.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Web-based SMB relay exploitation&lt;/strong&gt;. Worse, if the victim happens to use
Internet Explorer, a &lt;a href="http://perimetergrid.com/wp/2007/11/27/smb-reflection-made-way-too-easy"&gt;weakness&lt;/a&gt; in Microsoft&amp;#x2019;s SMB file
sharing authentication protocol can be &lt;a href="http://www.metasploit.net/data/confs/blackhat2007/tactical_paper.pdf"&gt;exploited&lt;/a&gt; to
own the victim&amp;#x2019;s machine completely. By including a link pointing to a
network file share, the victim is forced to authenticate to the attacker&amp;#x2019;s
fake SMB server. This exposes the challenge key that can in turn fed back to
the client.  Essentially, the victim now authenticates against himself. Once
connected, the incoming connection is disconnected and the new session serves
as a vehicle to execute arbitrary shellcode.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Who knows what HD&amp;#x2019;s new toy features beyond the sketched scenarios? In any
case, these attack vectors witness how broken the actual model of wireless
security on the client-side is. While the industry tries to fix wireless
encryption schemes, the actual targets, the users themselves, are not
considered in the equation. These new techniques essentially render networking
in any wireless environment tremendously insecure.&lt;/p&gt;

&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/_o5Q4pANEbs" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;HD Moore &lt;a href="http://hamsterswheel.com/techblog/?p=55"&gt;recently announced&lt;/a&gt; the
integration of the &lt;a href="http://www.theta44.org/karma"&gt;KARMA&lt;/a&gt; tools with the
&lt;a href="http://www.metasploit.com"&gt;metasploit&lt;/a&gt; framework. The implications of this
fusion are devastating. In an interview with Patrick Gray, HD presents the new
powerful capabilities that take client-side wireless exploitation to a new
level. Technically, HD rewrote parts of the original KARMA driver, included 
&lt;a href="http://digininja.org"&gt;some&lt;/a&gt;
&lt;a href="http://blog.metasploit.com/2008/02/rise-security-vs-asus-eee-pc.html"&gt;patches&lt;/a&gt;,
and integrated the KARMA user-land daemons into the metasploit framework.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2008/05/the-doom-of-client-side-wireless-network-security/</feedburner:origLink></entry>
  <entry>
    <id>tag:matthias.vallentin.net,2007-04-02:/blog/2007/04/writing-a-linux-kernel-driver-for-an-unknown-usb-device/</id>
    <title type="html">Writing a Linux Kernel Driver for an Unknown USB Device</title>
    <published>2007-04-02T07:00:00Z</published>
    <updated>2011-06-01T07:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/tGrePi05t1k/" />
    <content type="html">&lt;p&gt;This article explains the creation process of a Linux kernel device driver for
an undocumented USB device. After having reverse-engineered the USB
communication protocol, I present the architecture of the USB device driver. In
addition to the kernel driver I introduce a simple user-space tool that can be
used to control the device. Although I have to delve into the specifics of a
particular device, the process can be applied to other USB devices as well.&lt;/p&gt;

&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2007/04/dreamcheeky-ml.jpg" alt="Dream Cheeky USB missile launcher" style="float: right; margin-left: 10px; margin-bottom: 10px" align="right"&gt;

&lt;p&gt;To facilitate USB programming, the USB interface is accessible from user-space
with &lt;a href="http://libusb.sourceforge.net"&gt;libusb&lt;/a&gt;, a programming API concealing
low-level kernel interaction. The proper way to write a device driver for the
missile launcher would hence be to leverage this API and ignore any kernel
specifics. Nevertheless, I wanted to get involved with kernel programming and
decided thus to write a kernel module despite the increased complexity and
higher effort.&lt;/p&gt;

&lt;p&gt;The remainder of this article is structured as follows. After pointing to some
related work, I give a quick USB overview. Thereafter, I present the
reverse-engineering process to gather the unknown USB commands steering the
missile launcher. To come up with a full-featured kernel device driver, I
describe the kernel module architecture which incorporates the derived control
commands. Finally, I demonstrate a simple tool in user-space that makes use of
the driver.&lt;/p&gt;

&lt;h1 id="related-work"&gt;Related Work&lt;/h1&gt;

&lt;p&gt;Apparently I have not been the only one who played with this gadget. However,
none of the existing approaches I have encountered pursue the creation of a
Linux device driver for the &lt;em&gt;kernel&lt;/em&gt;. &lt;a href="http://kim.tensta.gannert.se/projects/launcher"&gt;The Launcher
Library&lt;/a&gt; provides a user-space
library based on libusb. &lt;a href="http://fatal.se/fulhack/ahmissile"&gt;AHmissile&lt;/a&gt; is a
GTK+ control tool; a ncurses application is
&lt;a href="http://www.amctrl.com/rocketlauncher.html"&gt;available&lt;/a&gt;, too.
Apple users get happy with the &lt;a href="http://dgwilson.wordpress.com"&gt;USB missile launcher
NZ&lt;/a&gt; project. Moreover, the python implementation
&lt;a href="http://scott.weston.id.au/software/pymissile"&gt;pymissile&lt;/a&gt; supports a missile
launcher of a different manufacturer. The author combined the missile
launcher with a webcam in order to to create an automated sentry guard reacting
on motion. I will return to these funky ideas later.&lt;/p&gt;

&lt;h1 id="usb-primer"&gt;USB primer&lt;/h1&gt;

&lt;p&gt;The universal serial bus (USB) connects a host computer with numerous
peripheral devices. It was designed to unify a wide range of slow and old buses
(parallel, serial, and keyboard connections) into a single bus type. It is
topologically not constructed as a bus, but rather as a tree of several
point-to-point links. The USB host controller periodically polls each device if
it has data to send. With this design, no device can send before it has not been
asked to do so, resulting in a plug-and-play-friendly architecture.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2007/04/usb-device.png" alt="USB device" style="float: right; margin-left: 10px; margin-bottom: 10px" align="right"&gt;

&lt;ul&gt;
&lt;li&gt;Control&lt;/li&gt;
  &lt;li&gt;Interrupt&lt;/li&gt;
  &lt;li&gt;Bulk&lt;/li&gt;
  &lt;li&gt;Isochronous&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;em&gt;Control endpoints&lt;/em&gt; are generally used to control the USB device
asynchronously, i.e. sending commands to it or retrieving status information
about it. Every device possesses a control &amp;#x201C;endpoint 0&amp;#x201D; which is used by the USB
core to initialize the device. &lt;em&gt;Interrupt endpoints&lt;/em&gt; occur periodically
and transfer small fixed-size data portions every time when the USB host asks
the device. They are commonly used by mice and keyboards as primary transport
method. As &lt;em&gt;bulk&lt;/em&gt; and &lt;em&gt;isochronous endpoints&lt;/em&gt; are not relevant for
our missile launcher, I skip their discussion. An excellent introduction from a
programming perspective gives the &lt;a href="http://lwn.net/Kernel/LDD3"&gt;Linux Device
Drivers&lt;/a&gt; book. Below is
some output from &lt;code&gt;lsusb -v&lt;/code&gt; providing detailed information about the missile
launcher.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Bus 005 Device 004: ID 1941:8021
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0         8
  idVendor           0x1941
  idProduct          0x8021
  bcdDevice            1.00
  iManufacturer           0
  iProduct                0
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           34
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xa0
      Remote Wakeup
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Devices
      bInterfaceSubClass      0 No Subclass
      bInterfaceProtocol      0 None
      iInterface              0
        HID Device Descriptor:
          bLength                 9
          bDescriptorType        33
          bcdHID               1.00
          bCountryCode            0 Not supported
          bNumDescriptors         1
          bDescriptorType        34 Report
          wDescriptorLength      52
         Report Descriptors:
           ** UNAVAILABLE **
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The output is structured and indented like a typical USB device. First, vendor
and product ID uniquely identify this USB gadget. These IDs are used by the USB
core to decide which driver to give a device to. Moreover, hotplug scripts can
decide which driver to load when a particular device is plugged in. Next, we
can read off the maximum power usage (100 mA) in the configuration section. The
subordinate interface contains apparently one interrupt IN endpoint (besides
the control endpoint 0) that can be accessed at address &lt;code&gt;0x81&lt;/code&gt;. Because it is
an IN endpoint, it returns status information from the device. To handle the
incoming data we first need to understand the missile launcher control
protocol.&lt;/p&gt;

&lt;h1 id="reverse-engenireering-the-usb-protocol"&gt;Reverse-engenireering the USB protocol&lt;/h1&gt;

&lt;p&gt;The first step involves reverse-engineering (or &amp;#x201C;snooping&amp;#x201D;) the USB
communication protocol spoken by the binary Windows driver. One approach would
be to consign the device in a VMware and capture the exchanged data on the host
system. But since several tools to analyze USB traffic already exist, the easier
solution is to rely on one of those. The most popular free application appears
to be &lt;a href="http://sourceforge.net/projects/usbsnoop"&gt;SnoopyPro&lt;/a&gt;.  Surprisingly I do
not have Windows box at hand, so I had to install the binary driver together
with SnoopyPro in a VMware.&lt;/p&gt;

&lt;p&gt;In order to capture all relevant USB data and intercept all device control
commands, the missile launcher has to perform every possible action while being
monitored: moving the two axes alone and together, shooting, and moving to the
limiting axes boundaries (which will trigger a notification that the axes
cannot be moved further in one direction). While analyzing the &lt;a href="http://matthias.vallentin.net/dump/snoopy-pro.xml"&gt;SnoopyPro
dump&lt;/a&gt;, one can easily discover the control commands sent
to the missile launcher. As an example, the Figure below shows an 8 byte
transfer buffer.  When moving the missile launcher to the right, the buffer
holds &lt;code&gt;0x00000008&lt;/code&gt;. Moving the launcher up changes the buffer contents to
&lt;code&gt;0x00000001&lt;/code&gt;. It is apparently very easy to deduce the control bytes used to
control the missile launcher. Unless a &amp;#x201C;stop&amp;#x201D; command (&lt;code&gt;0x00000000&lt;/code&gt;) is sent to
the device, it keeps the state of the last command.  This means if the &amp;#x201C;down&amp;#x201D;
command is issued, the device continues to turn until it receives a new
command. If it is not possible to move further, the motor keeps up running and
the gears crack with a unbearable painful sound.  Upon closer examination, the
interrupt IN endpoint buffer varies depending on the current device position.
Whensoever an axis reaches its boundary (and creates the maddening sound), the
device detects it and changes the interrupt buffer contents accordingly. This
means of notification can be leveraged by the kernel developer to implement a
boundary checking mechanism sending a stop command as soon as the missile
launcher runs against a wall.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2007/04/snoopypro.png" alt="USB device" style="display: block; margin: 0 auto;"&gt;

&lt;p&gt;Here is an excerpt of the driver source showing the complete list of control
commands that can be sent to the device.&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="cp"&gt;#define ML_STOP         0x00&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_UP           0x01&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_DOWN         0x02&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_LEFT         0x04&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_RIGHT        0x08&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_UP_LEFT      (ML_UP | ML_LEFT)&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_DOWN_LEFT    (ML_DOWN | ML_LEFT)&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_UP_RIGHT     (ML_UP | ML_RIGHT)&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_DOWN_RIGHT   (ML_DOWN | ML_RIGHT)&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_FIRE         0x10&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The following bytes appear in the buffer of the interrupt IN endpoint (shown as
comment) and indicate that a boundary has been reached.&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="cp"&gt;#define ML_MAX_UP       0x80        &lt;/span&gt;&lt;span class="cm"&gt;/* 80 00 00 00 00 00 00 00 */&lt;/span&gt;&lt;span class="cp"&gt;&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_MAX_DOWN     0x40        &lt;/span&gt;&lt;span class="cm"&gt;/* 40 00 00 00 00 00 00 00 */&lt;/span&gt;&lt;span class="cp"&gt;&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_MAX_LEFT     0x04        &lt;/span&gt;&lt;span class="cm"&gt;/* 00 04 00 00 00 00 00 00 */&lt;/span&gt;&lt;span class="cp"&gt;&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_MAX_RIGHT    0x08        &lt;/span&gt;&lt;span class="cm"&gt;/* 00 08 00 00 00 00 00 00 */&lt;/span&gt;&lt;span class="cp"&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With all required control information in place, let&amp;#x2019;s now adopt the programmer&amp;#x2019;s
perspective and delve into the land of kernel programming.&lt;/p&gt;

&lt;h1 id="the-device-driver"&gt;The device driver&lt;/h1&gt;

&lt;p&gt;Writing code for the kernel is an art by itself and I will only touch the tip of
the iceberg. To get a deeper understanding I recommend the books &lt;a href="http://lwn.net/Kernel/LDD3"&gt;Linux Device
Drivers&lt;/a&gt; and &lt;a href="http://www.oreilly.com/catalog/understandlk"&gt;Understanding the Linux
Kernel&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As for many other disciplines the separation of &lt;em&gt;mechanism&lt;/em&gt; and &lt;em&gt;policy&lt;/em&gt; is a
fundamental paradigm a programmer should follow. The mechanism provides the
capabilities whereas the policy expresses rules how to use those capabilities.
Different environments generally access the hardware in different ways. It is
hence imperative to write &lt;em&gt;policy-neutral&lt;/em&gt; code: a driver should make the
hardware available without imposing constraints.&lt;/p&gt;

&lt;p&gt;A nice feature of Linux is the ability to dynamically link object code to the
running kernel. That piece of object code is called a kernel &lt;em&gt;module&lt;/em&gt;.
Linux distinguishes between three basic device types that a module can
implement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Character devices&lt;/li&gt;
  &lt;li&gt;Block devices&lt;/li&gt;
  &lt;li&gt;Network interfaces&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;A &lt;em&gt;Character (char) device&lt;/em&gt; transfers a stream of bytes from and to the
user process. The module therefore implements system calls such as
&lt;em&gt;open&lt;/em&gt;, &lt;em&gt;close&lt;/em&gt;, &lt;em&gt;read&lt;/em&gt;, &lt;em&gt;write&lt;/em&gt; and &lt;em&gt;ioctl&lt;/em&gt;.
A char device looks like a file, except that file is &amp;#x201C;seekable&amp;#x201D; and most devices
operate sequentially. Examples for char devices are the text console
(&lt;code&gt;/dev/console&lt;/code&gt;) and serial ports (&lt;code&gt;/dev/ttyS0&lt;/code&gt;). Most simple
hardware devices are driven by char drivers. Discussing &lt;em&gt;block devices&lt;/em&gt;
and &lt;em&gt;network interfaces&lt;/em&gt; goes beyond the scope of this article, please
refer to the specified literature for details.&lt;/p&gt;

&lt;p&gt;Besides this classification, other orthogonal ways exist. As an example, USB
devices are implemented as USB modules but can show up as char devices (like
our missile launcher), block devices (USB sticks, say), or network interfaces
(a USB Ethernet interface). Let us now look at the rough structure of a USB
kernel module and then turn to particularities of the missile launcher.&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_ml&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* One structure for each connected device */&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_device_id&lt;/span&gt; &lt;span class="n"&gt;ml_table&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;USB_DEVICE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ML_VENDOR_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ML_PRODUCT_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;ml_open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;inode&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;inode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* open syscall */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;ml_release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;inode&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;inode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* close syscall */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;ssize_t&lt;/span&gt; &lt;span class="n"&gt;ml_write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;__user&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;user_buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt;
        &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loff_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ppos&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* write syscall */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;file_operations&lt;/span&gt; &lt;span class="n"&gt;ml_fops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;owner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;    &lt;span class="n"&gt;THIS_MODULE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;    &lt;span class="n"&gt;ml_write&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;     &lt;span class="n"&gt;ml_open&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;release&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="n"&gt;ml_release&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;ml_probe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_interface&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_device_id&lt;/span&gt;
        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* called when a USB device is connected to the computer. */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;ml_disconnect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_interface&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* called when unplugging a USB device. */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_driver&lt;/span&gt; &lt;span class="n"&gt;ml_driver&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"missile_launcher"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ml_table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;probe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ml_probe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;disconnect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ml_disconnect&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;__init&lt;/span&gt; &lt;span class="nf"&gt;usb_ml_init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* called on module loading */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;__exit&lt;/span&gt; &lt;span class="nf"&gt;usb_ml_exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* called on module unloading */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;module_init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usb_ml_init&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;module_exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usb_ml_exit&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Apart from some global variables, helper functions, and interrupt handlers,
this is already the entire kernel module! But let&amp;#x2019;s start off step by step. The
USB driver is represented by a &lt;code&gt;struct usb_driver&lt;/code&gt; containing some function
callbacks and variables identifying the USB driver. When the module is loaded
via the &lt;em&gt;insmod&lt;/em&gt; program, the &lt;code&gt;__init usb_ml_init(void)&lt;/code&gt; function is executed
which registers the driver with the USB subsystem. When the module is unloaded,
&lt;code&gt;__exit usb_ml_exit(void)&lt;/code&gt; is called which deregisters the driver from the USB
subsystem. The &lt;code&gt;__init&lt;/code&gt; and &lt;code&gt;__exit&lt;/code&gt; tokens indicate that these functions are
only called at initialization and exit time. Having loaded the module, the
&lt;em&gt;probe&lt;/em&gt; and &lt;em&gt;disconnect&lt;/em&gt; function callbacks are set up. In the probe function
callback, which is called when the device is being plugged in, the driver
initializes any local data structures used to manage the USB device. For
example, it allocates memory for the &lt;code&gt;struct usb_ml&lt;/code&gt; which contains run-time
status information about the connected device. Here is an excerpt from the
beginning of the function:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;ml_probe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_interface&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_device_id&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_device&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;udev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;interface_to_usbdev&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_ml&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_host_interface&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;iface_desc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_endpoint_descriptor&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;int_end_size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;retval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ENODEV&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="n"&gt;udev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;DBG_ERR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"udev is NULL"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;dev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kzalloc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_ml&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;GFP_KERNEL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;DBG_ERR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"cannot allocate memory for struct usb_ml"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;retval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ENOMEM&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ML_STOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;init_MUTEX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;sem&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;spin_lock_init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;cmd_spinlock&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;udev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;udev&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;interface&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;iface_desc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;cur_altsetting&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Set up interrupt endpoint information. */&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;iface_desc&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bNumEndpoints&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt;&lt;span class="n"&gt;iface_desc&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(((&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bEndpointAddress&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;USB_ENDPOINT_DIR_MASK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;USB_DIR_IN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bmAttributes&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;USB_ENDPOINT_XFERTYPE_MASK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;
                    &lt;span class="n"&gt;USB_ENDPOINT_XFER_INT&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;int_in_endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;int_in_endpoint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;DBG_ERR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"could not find interrupt in endpoint"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* ... */&lt;/span&gt;

    &lt;span class="cm"&gt;/* We can register the device now, as it is ready. */&lt;/span&gt;
    &lt;span class="n"&gt;retval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;usb_register_dev&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt;&lt;span class="n"&gt;ml_class&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* ... */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You might have noted the use of &lt;code&gt;goto&lt;/code&gt; statements in this code snippet. While
&lt;code&gt;goto&lt;/code&gt; statements are generally &lt;a href="http://www.acm.org/classics/oct95"&gt;considered
harmful&lt;/a&gt;, kernel programmers, however,
employ &lt;code&gt;goto&lt;/code&gt; statements to bundle error handling at a central place,
eliminating complex, highly-indented logic. The probe function allocates memory
for the internal device structure, initializes semaphores and spin-locks, and
sets up endpoint information.  Somewhat later in the function, the device is
being registered. The device is now ready to be accessed from user space via
system calls. I will discuss the simple user-space tool accessing the missile
launcher shortly. Yet before that, I present the communication primitives used
to send data to the device.&lt;/p&gt;

&lt;p&gt;The Linux USB implementation uses a &lt;em&gt;USB request block&lt;/em&gt; (URB) as &amp;#x201C;data
carrier&amp;#x201D; to communicate with USB devices. URBs are like data messages that are
sent asynchronously from and to endpoints. Remember that the USB standard
includes four types of endpoints. Likewise, four different types of URBs exist,
namely control, interrupt, bulk, and isochronous URBs. Once an URB has been
allocated and initialized by the driver, it is be submitted to the USB core
which forwards it to the device. If the URB was successfully delivered to the
USB core, a &lt;em&gt;completion handler&lt;/em&gt; is executed. Then the USB core returns
control to the device driver.&lt;/p&gt;

&lt;p&gt;As our missile launcher features two endpoints (endpoint 0 and the interrupt
endpoint), we have to deal with both control and interrupt URBs. The
reverse-engineered commands are basically packed into an control URB and then
sent out to the device. Also, we continuously receive status information from
the periodic interrupt URBs. For example, to send simple data to the missile
launcher, the function &lt;code&gt;usb_control_msg&lt;/code&gt; is used:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="n"&gt;memset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/* The interrupt-in-endpoint handler also modifies dev-command. */&lt;/span&gt;
&lt;span class="n"&gt;spin_lock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;cmd_spinlock&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;spin_unlock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;cmd_spinlock&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;retval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;usb_control_msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;udev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;usb_sndctrlpipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;udev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;ML_CTRL_REQUEST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ML_CTRL_REQEUST_TYPE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ML_CTRL_VALUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ML_CTRL_INDEX&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;HZ&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retval&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;DBG_ERR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"usb_control_msg failed (%d)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retval&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;unlock_exit&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The command &lt;code&gt;cmd&lt;/code&gt; is inserted into the buffer &lt;code&gt;buf&lt;/code&gt;
containing the data to be sent to the device. If the URB completes successfully,
the corresponding handler is executed. It performs nothing fancy, except telling
the driver that we launched a (yet uncorrected) command via the &lt;em&gt;write&lt;/em&gt;
syscall:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;ml_ctrl_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;urb&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;urb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;usb_ml&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;urb&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;correction_required&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We do not want the missile launcher hardware to be damaged by neither sending
improper commands nor sending any commands when it reached an axis boundary.
Ideally, whenever an axis boundary is reached (meaning that the missile launcher
cannot turn further in one direction), the device should stop the movement in
the particular direction. The completion handler of the interrupt URB turns out
to be the right place to implement this idea:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;ml_int_in_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;urb&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;urb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="cm"&gt;/* ... */&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;int_in_buffer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;ML_MAX_UP&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;ML_UP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="n"&gt;ML_UP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;correction_required&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;int_in_buffer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;ML_MAX_DOWN&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt;
                &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;ML_DOWN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="n"&gt;ML_DOWN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;correction_required&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;int_in_buffer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;ML_MAX_LEFT&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;ML_LEFT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="n"&gt;ML_LEFT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;correction_required&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;int_in_buffer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;ML_MAX_RIGHT&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt;
                &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;ML_RIGHT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="n"&gt;ML_RIGHT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;correction_required&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="cm"&gt;/* ... */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The above code is used to set the &lt;code&gt;correction_required&lt;/code&gt; variable which triggers
a &amp;#x201C;correction&amp;#x201D; control URB: this URB contains simply the last command without
the harming bit. Remember that the URB callback functions run in &lt;em&gt;interrupt
context&lt;/em&gt; and thus should not perform any memory allocations, hold semaphores,
or cause anything putting the process to sleep. With this automatic correction
mechanism, the missile launcher is shielded from improper use. Again, it does
not impose policy constraints, it protects only the device.&lt;/p&gt;

&lt;h1 id="controlling-the-missile-launcher-from-user-space"&gt;Controlling the missile launcher from user-space&lt;/h1&gt;

&lt;p&gt;For most folks fun starts in here. One doesn&amp;#x2019;t kick the bucket when
dereferencing NULL-pointers and the good old libc is available, too. After
having loaded the kernel module, the missile launcher is accessible via
&lt;code&gt;/dev/ml0&lt;/code&gt;. A second missile launcher would show up as &lt;code&gt;/dev/ml1&lt;/code&gt; and so on.
Here is a very simple application to control the device:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="cp"&gt;#include fcntl.h&lt;/span&gt;
&lt;span class="cp"&gt;#include stdio.h&lt;/span&gt;
&lt;span class="cp"&gt;#include stdlib.h&lt;/span&gt;
&lt;span class="cp"&gt;#include unistd.h&lt;/span&gt;

&lt;span class="cp"&gt;#define DEFAULT_DEVICE      "/dev/ml0"&lt;/span&gt;
&lt;span class="cp"&gt;#define DEFAULT_DURATION    800&lt;/span&gt;

&lt;span class="cp"&gt;#define ML_STOP         0x00&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_UP           0x01&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_DOWN         0x02&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_LEFT         0x04&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_RIGHT        0x08&lt;/span&gt;
&lt;span class="cp"&gt;#define ML_FIRE         0x10&lt;/span&gt;

&lt;span class="cp"&gt;#define ML_FIRE_DELAY   5000&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;send_cmd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;retval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;retval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retval&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"an error occured: %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retval&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;usage: %s [-mslrudfh] [-t msecs]&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"  -m      missile launcher [/dev/ml0]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"  -s      stop&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"  -l      turn left&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"  -r      turn right&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"  -u      turn up&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"  -d      turn down&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"  -f      fire&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"  -t      specify duration in milli seconds&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"  -h      display this help&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"notes:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"* it is possible to combine the directions of the two axes, e.g.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;"  '-lu' send_cmds the missile launcher up and left at the same time.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ML_STOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_DURATION&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_DEVICE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argc&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getopt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"mslrudfht:"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'m'&lt;/span&gt;: &lt;span class="n"&gt;dev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optarg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'l'&lt;/span&gt;: &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;|=&lt;/span&gt; &lt;span class="n"&gt;ML_LEFT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'r'&lt;/span&gt;: &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;|=&lt;/span&gt; &lt;span class="n"&gt;ML_RIGHT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'u'&lt;/span&gt;: &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;|=&lt;/span&gt; &lt;span class="n"&gt;ML_UP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'d'&lt;/span&gt;: &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;|=&lt;/span&gt; &lt;span class="n"&gt;ML_DOWN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'f'&lt;/span&gt;: &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ML_FIRE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'s'&lt;/span&gt;: &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ML_STOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'t'&lt;/span&gt;: &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;atoi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optarg&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="nl"&gt;default:&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;O_RDWR&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;perror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"open"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;send_cmd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;&lt;/span&gt; &lt;span class="n"&gt;ML_FIRE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ML_FIRE_DELAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;ML_UP&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;ML_DOWN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;/=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;usleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;send_cmd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ML_STOP&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;EXIT_SUCCESS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This tool, let&amp;#x2019;s name it &lt;code&gt;ml_control&lt;/code&gt;, allows the user to send data to the
device via the &lt;em&gt;write&lt;/em&gt; syscall. For example, the device moves three seconds up
and left with &lt;code&gt;./ml_control -ul -t 3000&lt;/code&gt;, shoots with &lt;code&gt;./ml_control -f&lt;/code&gt;, or
stop with &lt;code&gt;./ml_control -s&lt;/code&gt;.  Consider the code as proof of concept, of course
more sophisticated applications are imaginable.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2007/04/hands-up.png" alt="Hands up!" style="float: right; margin-left: 10px; margin-bottom: 10px" align="right"&gt;

&lt;h1 id="summary"&gt;Summary&lt;/h1&gt;

&lt;p&gt;In this article, I frame the creation of a USB device driver for the Linux
kernel. At first I reverse-engineer the unknown USB protocol by intercepting
all USB traffic to and from the device with the Windows driver. Having captured
the complete communication primitives, I explain how to build a USB kernel
driver. Finally, a proof-of-conecpt user-space tool is presented that lays the
foundation stone for further fancy ideas. Future work touches topics like
augmenting the missile launcher with a video camera or mounting it on arbitrary
devices. The code from this article and a full implementation of the device
driver is available at my &lt;a href="https://github.com/mavam/ml-driver"&gt;github repository&lt;/a&gt;.&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/tGrePi05t1k" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;This article explains the creation process of a Linux kernel device driver for
an undocumented USB device. After having reverse-engineered the USB
communication protocol, I present the architecture of the USB device driver. In
addition to the kernel driver I introduce a simple user-space tool that can be
used to control the device. Although I have to delve into the specifics of a
particular device, the process can be applied to other USB devices as well.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2007/04/writing-a-linux-kernel-driver-for-an-unknown-usb-device/</feedburner:origLink></entry>
  <entry>
    <id>tag:matthias.vallentin.net,2007-01-06:/blog/2007/01/examinig-and-dissecting-tcpdump-libpcap-traces/</id>
    <title type="html">Examining and Dissecting tcpdump/libpcap Traces</title>
    <published>2007-01-06T08:00:00Z</published>
    <updated>2007-01-06T08:00:00Z</updated>
    <link rel="alternate" href="http://feeds.matthias.vallentin.net/~r/blog-atom/~3/2hXvdGDyeUo/" />
    <content type="html">&lt;p&gt;Almost every network research involves trace-based analysis with a captured
stream of packets. This article presents a set of analysis tools which extract
detailed information from &lt;a href="http://www.tcpdump.org"&gt;tcpdump/libpcap&lt;/a&gt; traces.&lt;/p&gt;

&lt;p&gt;Having a pcap trace in place, it is about to gather futher details from it,
e.g., the traffic peak rate or which protocol accounts for the largest share.
Many tools exist to accomplish this task. Below I will describe three tools
that particularly caught my attention while working with network traffic:
tcpdstat, ipsumdump, and Netdude.&lt;/p&gt;

&lt;h1 id="tcpdstat"&gt;tcpdstat&lt;/h1&gt;

&lt;p&gt;Written by Kenjiro Cho,
&lt;a href="http://www.sonycsl.co.jp/~kjc/papers/freenix2000/node14.html"&gt;tcpdstat&lt;/a&gt;
is a powerful  tool that performs an in-depth protocol breakdown by bytes and
packets. It further displays average and maximum transfer rates, IP flow
information, and packet size distribution. Dave Dittrich applied several
&lt;a href="http://staff.washington.edu/dittrich/talks/core02/tools/tools.html"&gt;tweaks&lt;/a&gt;
the tool to support a broader range of protocols and services, and to report
more details about flow rates.&lt;/p&gt;

&lt;p&gt;Here is an example output (of Dave&amp;#x2019;s enhanced version):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;DumpFile:  trace.pcap
FileSize: 98876.89MB
Id: 200703011241
StartTime: (anonymized)
EndTime:   (anonymized)
TotalTime: 7216.13 seconds
TotalCapSize: 96826.91MB  CapLen: 1514 bytes
# of packets: 134347439 (96826.91MB)
AvgRate: 113.10Mbps  stddev:47.96M   PeakRate: 260.92Mbps

### IP flow (unique src/dst pair) Information ###
# of flows: 1612801  (avg. 83.30 pkts/flow)
Top 10 big flow size (bytes/total in %):
 33.6%  3.2%  2.2%  1.5%  1.4%  1.0%  1.0%  0.9%  0.8%  0.8%

### IP address Information ###
# of IPv4 addresses: 480065
Top 10 bandwidth usage (bytes/total in %):
 34.4% 34.4%  3.3%  3.3%  3.0%  2.7%  2.3%  1.8%  1.5%  1.5%

### Packet Size Distribution (including MAC headers) ###
&amp;lt;&amp;lt;&amp;lt;&amp;lt;
 [   32-   63]:   20839652
 [   64-  127]:   38798140
 [  128-  255]:    3947049
 [  256-  511]:    3746280
 [  512- 1023]:    5675556
 [ 1024- 2047]:   61340762
&amp;gt;&amp;gt;&amp;gt;&amp;gt;

### Protocol Breakdown ###
&amp;lt;&amp;lt;&amp;lt;&amp;lt;
     protocol           packets                 bytes           bytes/pkt
------------------------------------------------------------------------
[0] total        134347439 (100.00%)     101530372750 (100.00%)    755.73
[1] ip           134347439 (100.00%)     101530372750 (100.00%)    755.73
[2]  tcp         118172509 ( 87.96%)      97361936181 ( 95.89%)    823.90
[3]   ftpdata        18640 (  0.01%)         16529412 (  0.02%)    886.77
[3]   ftp            72372 (  0.05%)          4697330 (  0.00%)     64.91
[3]   ssh         13849679 ( 10.31%)      11113777353 ( 10.95%)    802.46
[3]   telnet          9007 (  0.01%)          1526445 (  0.00%)    169.47
[3]   smtp         2133471 (  1.59%)       1447293494 (  1.43%)    678.38
[3]   name              23 (  0.00%)             1426 (  0.00%)     62.00
[3]   dns            35071 (  0.03%)          7071657 (  0.01%)    201.64
[3]   http(s)     25043480 ( 18.64%)      30677552254 ( 30.22%)   1224.97
[3]   http(c)     16165378 ( 12.03%)       2182851897 (  2.15%)    135.03
[3]   kerb5            370 (  0.00%)            30610 (  0.00%)     82.73
[3]   pop3           82382 (  0.06%)         26718043 (  0.03%)    324.32
[3]   sunrpc            30 (  0.00%)             3002 (  0.00%)    100.07
[3]   ident           5107 (  0.00%)           322074 (  0.00%)     63.07
[3]   nntp            1262 (  0.00%)           292679 (  0.00%)    231.92
[3]   epmap         209144 (  0.16%)         12909976 (  0.01%)     61.73
[3]   netb-se       404237 (  0.30%)         47178014 (  0.05%)    116.71
[3]   imap          125983 (  0.09%)        100889454 (  0.10%)    800.82
[3]   bgp              482 (  0.00%)            43139 (  0.00%)     89.50
[3]   ldap            7131 (  0.01%)          1434769 (  0.00%)    201.20
[3]   https        2941177 (  2.19%)       1802114169 (  1.77%)    612.72
[3]   ms-ds         245214 (  0.18%)         24263111 (  0.02%)     98.95
[3]   rtsp         1023246 (  0.76%)        691696863 (  0.68%)    675.98
[3]   ldaps           2828 (  0.00%)           209272 (  0.00%)     74.00
[3]   socks           7883 (  0.01%)          1340672 (  0.00%)    170.07
[3]   kasaa          13348 (  0.01%)          1124944 (  0.00%)     84.28
[3]   mssql-s       309786 (  0.23%)         20411848 (  0.02%)     65.89
[3]   squid          51381 (  0.04%)         14079861 (  0.01%)    274.03
[3]   ms-gc           1865 (  0.00%)           493682 (  0.00%)    264.71
[3]   ms-gcs          2034 (  0.00%)           481178 (  0.00%)    236.57
[3]   hotline            6 (  0.00%)              682 (  0.00%)    113.67
[3]   realaud        19784 (  0.01%)         13197979 (  0.01%)    667.10
[3]   icecast       390203 (  0.29%)        291651836 (  0.29%)    747.44
[3]   gnu6346         6324 (  0.00%)          1048473 (  0.00%)    165.79
[3]   gnu6348          342 (  0.00%)            26047 (  0.00%)     76.16
[3]   gnu6349           14 (  0.00%)             2767 (  0.00%)    197.64
[3]   gnu6350            4 (  0.00%)              732 (  0.00%)    183.00
[3]   irc6666            7 (  0.00%)              434 (  0.00%)     62.00
[3]   irc6667         1379 (  0.00%)           196155 (  0.00%)    142.24
[3]   irc6668            2 (  0.00%)              124 (  0.00%)     62.00
[3]   irc6669            9 (  0.00%)              666 (  0.00%)     74.00
[3]   napster           21 (  0.00%)             1344 (  0.00%)     64.00
[3]   irc7000            7 (  0.00%)              824 (  0.00%)    117.71
[3]   http-a        129807 (  0.10%)         71136838 (  0.07%)    548.02
[3]   other       54862568 ( 40.84%)      48787331392 ( 48.05%)    889.26
[2]  udp          13069221 (  9.73%)       3895596348 (  3.84%)    298.07
[3]   name              18 (  0.00%)             1989 (  0.00%)    110.50
[3]   dns          1799081 (  1.34%)        264263480 (  0.26%)    146.89
[3]   kerb5            100 (  0.00%)            25812 (  0.00%)    258.12
[3]   sunrpc           581 (  0.00%)            57157 (  0.00%)     98.38
[3]   ntp            50387 (  0.04%)          4534933 (  0.00%)     90.00
[3]   epmap             17 (  0.00%)             1824 (  0.00%)    107.29
[3]   netb-ns       148619 (  0.11%)         14736588 (  0.01%)     99.16
[3]   netb-se         1272 (  0.00%)           328673 (  0.00%)    258.39
[3]   ms-ds              8 (  0.00%)              883 (  0.00%)    110.38
[3]   kazaa             29 (  0.00%)             3546 (  0.00%)    122.28
[3]   mssql-s           44 (  0.00%)             3832 (  0.00%)     87.09
[3]   mcast        7216682 (  5.37%)       1943012688 (  1.91%)    269.24
[3]   realaud       459195 (  0.34%)        273532235 (  0.27%)    595.68
[3]   halflif           81 (  0.00%)             5890 (  0.00%)     72.72
[3]   starcra           45 (  0.00%)             6367 (  0.00%)    141.49
[3]   everque            9 (  0.00%)             1351 (  0.00%)    150.11
[3]   unreal          1066 (  0.00%)            93951 (  0.00%)     88.13
[3]   quake             20 (  0.00%)             1860 (  0.00%)     93.00
[3]   other        3384119 (  2.52%)       1394472416 (  1.37%)    412.06
[2]  icmp          3105709 (  2.31%)        272840221 (  0.27%)     87.85
[2]  frag            30903 (  0.02%)         25672129 (  0.03%)    830.73
&amp;gt;&amp;gt;&amp;gt;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;At first, tcpdstat shows a summary including the trace size, time, and number
of packets. Most interesting here is the average and peak traffic rate in Mbps
and the standard deviation, measuring how bursty the traffic was. The second
block shows the number of IP flows (i.e., the pair of src and dst address) and
the 10 biggest flows in percent. Next, the total disctinct IP addresses are
displayed along with the top 10 bandwith usage in percent. Unfortunately, only
the percent values are shown and not the addresses themselves. The next block
displays the packet size distribution including MAC headers. The last big block
is the very informative, illustrating a per-packet and per-byte protocol
breakdown. The IP protocol is partitioned in TCP, UDP, ICMP, and fragmented
packets. TCP and UDP packets are further broken down into their application
layer protocols.&lt;/p&gt;

&lt;p&gt;Tcpdstat is great to get a bunch of information out of a trace, very easy to
use, but it lacks some flexibility. The tool is your perfect choice if the
above output is enough for you. Otherwise, you might incorporate further tools
into your analysis.&lt;/p&gt;

&lt;h1 id="ipsumpdump"&gt;ipsumdump&lt;/h1&gt;

&lt;p&gt;The &lt;a href="http://www.cs.ucla.edu/~kohler/ipsumdump"&gt;ipsumpdump&lt;/a&gt; utility from Eddie
Kohler is the &amp;#x201C;swiss-army knife&amp;#x201D; for trace analysis.
It&lt;strong&gt;sum&lt;/strong&gt;marizes TCP/&lt;strong&gt;IP dump&lt;/strong&gt; files into a nice ASCII format that can be
intuitively understood and easily parsed.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Ipsumdump can read packets from network interfaces, from tcpdump files, and
from existing ipsumdump files. It will transparently uncompress tcpdump or
ipsumdump files when necessary. It can randomly sample traffic, filter
traffic based on its contents, anonymize IP addresses, and sort packets from
multiple dumps by timestamp. Also, it can optionally create a tcpdump file
containing actual packet data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To demonstrate its versatility, I provided the output of &lt;code&gt;ipsumdump-h&lt;/code&gt; below:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;'Ipsumdump' reads IP packets from tcpdump(1) files, or network interfaces,
and summarizes their contents in an ASCII log.

Usage: ipsumdump [CONTENT OPTIONS] [-i DEVNAMES | FILES] &amp;gt; LOGFILE

Options that determine summary dump contents (can give multiple options):
  -t, --timestamp            Include packet timestamps.
  -T, --first-timestamp      Include flow-begin timestamps.
  -s, --src                  Include IP source addresses.
  -d, --dst                  Include IP destination addresses.
  -S, --sport                Include TCP/UDP source ports.
  -D, --dport                Include TCP/UDP destination ports.
  -l, --length               Include IP lengths.
  -p, --protocol             Include IP protocols.
      --id                   Include IP IDs.
  -g, --fragment             Include IP fragment flags ('F' or '.').
  -G, --fragment-offset      Include IP fragment offsets.
      --ip-sum               Include IP checksums.
      --ip-opt               Include IP options.
  -F, --tcp-flags            Include TCP flags word.
  -Q, --tcp-seq              Include TCP sequence numbers.
  -K, --tcp-ack              Include TCP acknowledgement numbers.
  -W, --tcp-window           Include TCP receive window (unscaled).
  -O, --tcp-opt              Include TCP options.
      --tcp-sack             Include TCP selective acknowledgement options.
      --udp-length           Include UDP lengths.
  -L, --payload-length       Include payload lengths (no IP/UDP/TCP headers).
      --payload              Include packet payloads as quoted strings.
      --payload-md5          Include MD5 checksum of packet payloads.
      --capture-length       Include lengths of captured IP data.
  -c, --packet-count         Include packet counts (usually 1).
      --link                 Include link numbers (NLANR/NetFlow).

Data source options (give exactly one):
  -r, --tcpdump              Read tcpdump(1) FILES (default).
  -i, --interface            Read network devices DEVNAMES until interrupted.
      --ipsumdump            Read existing ipsumdump FILES.
      --format FORMAT        Read ipsumdump FILES with format FORMAT.
      --dag[=ENCAP]          Read DAG-format FILES.
      --nlanr                Read NLANR-format FILES (fr/fr+/tsh).
      --netflow-summary      Read summarized NetFlow FILES.
      --tcpdump-text         Read tcpdump(1) text output FILES.

Other options:
  -o, --output FILE          Write summary dump to FILE (default stdout).
  -b, --binary               Create binary output file.
  -w, --write-tcpdump FILE   Also dump packets to FILE in tcpdump(1) format.
  -f, --filter FILTER        Apply tcpdump(1) filter FILTER to data.
  -A, --anonymize            Anonymize IP addresses (preserves prefix &amp;amp; class).
      --no-promiscuous       Do not put interfaces into promiscuous mode.
      --bad-packets          Print '!bad' messages for bad headers.
      --sample PROB          Sample packets with PROB probability.
      --multipacket          Produce multiple entries for a flow identifier
                             representing multiple packets (NetFlow only).
      --collate              Collate packets from data sources by timestamp.
      --interval TIME        Stop after TIME has elapsed in trace time.
      --limit-packets N      Stop after processing N packets.
      --map-address ADDRS    When done, print to stderr the anonymized IP
                             addresses and/or prefixes corresponding to ADDRS.
      --record-counts TIME   Record packet counts every TIME seconds in output.
      --random-seed SEED     Set random seed to SEED (default is random).
      --no-mmap              Don't memory-map input files.
  -q, --quiet                Do not print progress bar.
      --config               Output Click configuration and exit.
  -V, --verbose              Report errors verbosely.
  -h, --help                 Print this message and exit.
  -v, --version              Print version number and exit.

Report bugs to &amp;lt;kohler@cs.ucla.edu&amp;gt;.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Among the above options, some deserve more attention. For example, the
&lt;code&gt;--payload-md5&lt;/code&gt; option includes a MD5 checksum (e.g. i4CxGSojVHB2XcZw97ZpQb)
of the packet payload in the dump. This option comes in handy when you want to
check for packet duplicates. Another nifty option is &lt;code&gt;--anonymize&lt;/code&gt;.  Since
traces can contain sensitive data, it is possible to anonymize the IP addresses
in the output in order to prevent information leakage. The anonymization
preserves prefix and class. In high-volume networks, the
&lt;code&gt;--sample=p&lt;/code&gt; option might be interesting. It samples packets with probability
&lt;em&gt;p&lt;/em&gt;. That is, &lt;em&gt;p&lt;/em&gt; is the chance that a packet will cause output to be
generated. The actual probability may differ from the specified probability,
due to fixed point arithmetic. If you want to merge several trace files
retaining the temporal order, &lt;code&gt;--collate&lt;/code&gt; together with &lt;code&gt;--write-tcpdump&lt;/code&gt; sorts
your packets with increasing timestamp.&lt;/p&gt;

&lt;p&gt;Summing up, ipsumdump is a flexible trace analysis tool complementing tcpdump.
However, it does not feature predefined evaluation methods like
[tcpdstat][tcpdstat] (which is actually not a design goal).  Nevertheless,
ipsumdump is a valuable tool to quickly generate easily readable ASCII output
or to obtain well-formatted output for subsequent processing/scripting. It
unveils its real power when used for trace manipulation such as merging,
modifying, or anonymizing tcpdump traces.&lt;/p&gt;

&lt;h1 id="netdude"&gt;Netdude&lt;/h1&gt;

&lt;p&gt;&lt;a href="http://netdude.sourceforge.net"&gt;Netdude&lt;/a&gt; (&lt;strong&gt;Net&lt;/strong&gt;work &lt;strong&gt;du&lt;/strong&gt;mp data
&lt;strong&gt;d&lt;/strong&gt;isplayer and &lt;strong&gt;e&lt;/strong&gt;ditor) is a graphical tool to edit tcpdump trace files,
written by &lt;a href="http://www.icir.org/christian"&gt;Christian Kreibich&lt;/a&gt;. In fact, it is
a front-end to the &lt;a href="http://netdude.sourceforge.net/doco/libnetdude"&gt;libnetdude&lt;/a&gt;
packet manipulation library. Since complex trace manipulation is non-trivial
and often requires custom coding, Netdude provides a GUI enabling users to&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;edit traces of arbitrary size in a scalable fashion. Netdude never loads
more than a configurable maximum number of packets into memory at any time.&lt;/li&gt;
  &lt;li&gt;edit multiple traces at the same time, making it easy to move packets from
one trace to a different one.&lt;/li&gt;
  &lt;li&gt;Modify every field in protocol headers for which a protocol plugin provides
support. These modifications can be applied to either only individually
selected packets, packets currently in memory, or all packets in the trace,
including the ones not currently loaded.&lt;/li&gt;
  &lt;li&gt;Filter packets by using filter plugins. Netdude 0.4.6 ships with a BPF filter
plugin that allows you to use the standard BPF filter language to define
your filters.&lt;/li&gt;
  &lt;li&gt;Inspect and edit raw packet content using Netdude&amp;#x2019;s payload editor in
either hex or ASCII mode whichever is more convenient for the payload you are
editing.&lt;/li&gt;
  &lt;li&gt;Move packets around, duplicate them, remove them from traces.&lt;/li&gt;
  &lt;li&gt;See the tcpdump output updating instantly according to the modifications
you&amp;#x2019;re making.&lt;/li&gt;
  &lt;li&gt;Conveniently use the clipboard to select lines from the tcpdump output for
situations when you need the tcpdump output only (e.g., when writing
documentation, papers or emails).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As soon as packet editing is part of the game, Netdude offers an interactive
and user-friendly GUI to perform even sophisticated manipulations. The
screenshot below illustrates how one can graphically manipulate the TCP header
fields.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://matthias.vallentin.net/blog/2007/01/netdude.gif" alt="Fiddling with TCP header fields in Netdude" style="display: block; margin: 0 auto;"&gt;&lt;/p&gt;

&lt;h1 id="conclusion"&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;Nowadays, captured network traffic is mostly available as a tcpdump/libpcap
trace. In this article, I introduce three tools enabling in-depth trace
examination. At first, the tool tcpdstat provides a high-level view of the
trace ingredients with a detailed protocol breakdown. Second, ipsumdump offers
a flexible means to generate nicely formatted ASCII dump. It can be used to
quickly extract a desired piece of information or as a multi-purpose output
generator. Finally, Netdude features a comfortable GUI to selectively
manipulate packet details. Equipped with this arsenal, trace-based network
analysis feels like a hot knife through butter.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/blog-atom/~4/2hXvdGDyeUo" height="1" width="1"/&gt;</content>
    <summary type="html">&lt;p&gt;Almost every network research involves trace-based analysis with a captured
stream of packets. This article presents a set of analysis tools which extract
detailed information from &lt;a href="http://www.tcpdump.org"&gt;tcpdump/libpcap&lt;/a&gt; traces.&lt;/p&gt;</summary>
  <feedburner:origLink>http://matthias.vallentin.net/blog/2007/01/examinig-and-dissecting-tcpdump-libpcap-traces/</feedburner:origLink></entry>
</feed>

