<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Small Golden Sceptre &#187; Privacy</title>
	<atom:link href="http://mythopoeic.org/tag/privacy/feed/" rel="self" type="application/rss+xml" />
	<link>http://mythopoeic.org</link>
	<description>Technology, Rambling and Dragons</description>
	<lastBuildDate>Fri, 03 Feb 2012 03:00:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>MediaWiki: Creating a Private Wiki</title>
		<link>http://mythopoeic.org/mediawiki-private/</link>
		<comments>http://mythopoeic.org/mediawiki-private/#comments</comments>
		<pubDate>Tue, 21 Sep 2010 19:01:51 +0000</pubDate>
		<dc:creator>dhenke</dc:creator>
				<category><![CDATA[Compugeekery]]></category>
		<category><![CDATA[mediawiki]]></category>
		<category><![CDATA[perl script]]></category>
		<category><![CDATA[Privacy]]></category>

		<guid isPermaLink="false">http://mythopoeic.org/?p=878</guid>
		<description><![CDATA[MediaWiki is the software behind Wikipedia, but you can use it to create your own special-purpose sites. I&#8217;ve used it at work to build an internal company knowledge base, and I&#8217;m using it at home to make a Wiki for the fictional world of a roleplaying game I&#8217;m in. It&#8217;s a pretty polished software package, [...]]]></description>
			<content:encoded><![CDATA[<p>MediaWiki is the software behind Wikipedia, but you can use it to create your own special-purpose sites. I&#8217;ve used it at work to build an internal company knowledge base, and I&#8217;m using it at home to make a Wiki for the fictional world of a roleplaying game I&#8217;m in.</p>
<p>It&#8217;s a pretty polished software package, but out of the box it tends to assume that you are creating something like Wikipedia that is visible to (and editable by) the whole wide world. If that&#8217;s not what you want, it requires some tuning, which I&#8217;ll describe in detail after the jump.</p>
<p><span id="more-878"></span></p>
<h2>Why MediaWiki?</h2>
<p>An obvious question is, if the MediaWiki software needs tweaking to do what I want, why do I use it and not some other software? A few reasons:</p>
<ul>
<li>It&#8217;s just a Wiki. I&#8217;ve tried doing similar things with other packages (such as <a href="http://info.tiki.org/tiki-index.php">Tiki Wiki</a>), and ended up wasting lots of time fighting with or trying to turn off features I didn&#8217;t want or need. A CMS or blogging package or groupware suite are great if you need those things, but they all tend to assume a certain workflow and division of responsibility. If you are just trying to create a Wiki, you&#8217;ll waste time swimming against the current.</li>
<li>The Wikitext markup language is reasonably intuitive (simple things are simple, and new users can start contributing right away) but still powerful enough to create complex pages. It&#8217;s incredibly well-tested, and doesn&#8217;t flake out in odd corner-cases. There&#8217;s a huge volume of example pages out there.</li>
<li>It&#8217;s decently fast even on a modest computer, without resorting to esoteric fastcgi and caching tricks.</li>
<li>There are lots of useful plugins available.</li>
<li>Because of the huge user base, most &#8220;How do I&#8230;?&#8221; questions are answered with a simple web search.</li>
</ul>
<h2>Prerequisites</h2>
<p>The remainder of this article will assume that you&#8217;re starting with a fresh install of MediaWiki, and have an admin user set up (and know its password). You&#8217;ll also need the ability to edit files under the MediaWiki install tree (via command-line access or FTP or any other means).</p>
<p>While the methods I describe should be reasonably general, future releases may involve different ways of doing things. The process described below was tested and is known to work with the following software versions:</p>
<ul>
<li>MediaWiki 1.16.0</li>
<li>PHP 5.3.3 (cgi-fcgi)</li>
<li>MySQL 4.1.20</li>
<li>Perl 5.8.8</li>
<li>Apache 2.2.8</li>
</ul>
<h2>Restricting Access to Registered Users Only</h2>
<p>The first step in creating a private Wiki (and one which is well-documented elsewhere) is to make it so that only logged-in users can view or edit pages (other than the login page), and so that only sysops can create new accounts.</p>
<p>Edit your LocalSettings.php to include the following lines (I made all manual additions at the bottom of the file):</p>
<pre>
# Disable reading by anonymous users
$wgGroupPermissions['*']['read'] = false;

# But allow access to the login page or there will be no way to log in!
$wgWhitelistRead = array("Special:Userlogin", "MediaWiki:Common.css",
"MediaWiki:Common.js", "MediaWiki:Monobook.css", "MediaWiki:Monobook.js", "-");

# Disable anonymous editing
$wgGroupPermissions['*']['edit'] = false;

# Prevent new user registrations except by sysops
$wgGroupPermissions['*']['createaccount'] = false;
</pre>
<h2>Disabling &#8220;Remember Passwords&#8221; in MediaWiki</h2>
<p>That&#8217;s a good start, but there&#8217;s still a big problem: the &#8220;Remember my login on this computer&#8221; checkbox on the the login form. If the user checks this (and users, being users, will check it), it stores a password-equivalent token in a cookie on the user&#8217;s browser. That means that anyone with access to the user&#8217;s computer (which, for &#8216;doze boxes, is likely the whole Internet) can use your Wiki as that user.</p>
<p>Having the browser automatically log the user in isn&#8217;t quite the same as having no password at all, but it&#8217;s pretty bad. It&#8217;s giving up a huge degree of security for a very minor convenience. If what you&#8217;re protecting has any value at all, you don&#8217;t want it.</p>
<p>So how to get rid of this misfeature? Add the following to your LocalSettings.php:</p>
<pre>
# Disable "remember password" on login page:
require_once('extensions/NoRememberAuthPlugin.php');
$wgAuth = new NoRememberAuthPlugin();

# Disable "remember password" on user preferences page:
$wgHooks['GetPreferences'][] = 'NoRememberPrefHook';
function NoRememberPrefHook($user, &amp;$preferences) {
    unset($preferences['rememberpassword']);
    return true;
}
</pre>
<p>Add a file extensions/NoRememberAuthPlugin.php containing the following:</p>
<pre>
&lt;?php
require_once ('AuthPlugin.php');
class NoRememberAuthPlugin extends AuthPlugin {
   function modifyUITemplate(&amp; $template) {
      //disable 'remember me' box
      $template-&gt;set('remember', false);
      $template-&gt;set('canremember', false);
   }
}
?&gt;
</pre>
<p>Notice that we also keep users from turning &#8220;remember me&#8221; back on in their preferences, and disable it (at the next login) for any users who had it enabled before we made the change.</p>
<p>Also notice that all the modifications so far have been edits to LocalSettings.php and the creation of a new extension. These are all things that should survive a MediaWiki upgrade intact.</p>
<h2>Disabling &#8220;Remember Passwords&#8221; in the Browser</h2>
<p>Now that we&#8217;ve stopped the MediaWiki software from sabotaging its own security, our next challenge is the user&#8217;s web browser. The browser may have has its own separate &#8220;remember password&#8221; (aka &#8220;form fill in&#8221; or &#8220;wallet&#8221;) feature, and some users leave that turned on. We can&#8217;t really check for or enforce a particular setting; there&#8217;s no way, from the server end, to tell if the credentials were supplied via keyboard or autocompleted by the browser.</p>
<p>One approach we could use (and which I did use, on a previous project) is to randomize the URL of the login page or the field names or both. The browser can still &#8220;remember&#8221; the password, but won&#8217;t know to use it on the next login. On a scale from good to bad, this is bad: password-equivalent data is still sitting on the user&#8217;s disk. It&#8217;s also not easy (for me) to implement within MediaWiki.</p>
<p>Fortunately, there is another option: we can put a hint in the page source to tell the browser to turn off &#8220;remember password&#8221; for a particular form. We do this by adding an <strong>autocomplete</strong> attribute to the <strong>form</strong> element, with a value of <strong>off</strong>.</p>
<p>There are a couple disadvantages to this approach: First, autocomplete=&#8221;off&#8221; is not part of any web standard (though it is supported in both Gecko-engine browsers like Firefox and in IE). Second, I didn&#8217;t see any obvious way to add it to the necessary places using LocalSettings.php or an extension.</p>
<p>There are two places where the form-fill-in behavior is problematic: on the user login screen, and on the screen where the user can change his password. For the user login screen, edit the file <strong>includes/templates/Userlogin.php</strong>, find the form with <strong>name=&#8221;userlogin&#8221;</strong> and add the <strong>autocomplete=&#8221;off&#8221;</strong> attribute. (You may want to back up the file first.) Here is part of a context diff from my installation:</p>
<pre>
-&lt;form name="userlogin" method="post" action="&lt;?php $this-&gt;text('action') ?&gt;"&gt;
+&lt;form name="userlogin" method="post" action="&lt;?php $this-&gt;text('action') ?&gt;" autocomplete="off"&gt;
</pre>
<p>For the password change, edit the file <strong>includes/specials/SpecialResetpass.php</strong>. Look for the code that builds an array with an element <strong>&#8216;id&#8217; =&gt; &#8216;mw-resetpass-form&#8217;</strong> and add a new element <strong>&#8216;autocomplete&#8217; =&gt; &#8216;off&#8217;</strong>. Another context diff:</p>
<pre>
-                    'id' =&gt; 'mw-resetpass-form' ) ) . "\n" .
+                    'id' =&gt; 'mw-resetpass-form',
+                    'autocomplete' =&gt; 'off' ) ) . "\n" .
</pre>
<p>When you are testing this (and you should test it), remember to turn the &#8220;remember password&#8221; feature of your browser back off when you are done.</p>
<h2>Enforcing Password Quality</h2>
<p>Another weak link in the chain is password quality. People tend to pick bad, easily-guessed passwords. While we would like for them to not do that, it requires a delicate balancing act. If the password rules are too strict, users won&#8217;t be able to pick passwords they can remember, and will end up writing them down (which is worse in some ways than picking a weak password). If we pick really silly rules, it can dramatically narrow the search space for an attacker trying to find passwords by brute force.</p>
<p>On the other hand, using password quality rules has another benefit beyond making sure passwords are harder to guess: it helps ensure that users don&#8217;t use the same password for multiple unrelated sites.</p>
<p>There are MediaWiki extensions (like <a href="http://www.mediawiki.org/wiki/Extension:SafeCreate">SafeCreate</a>) which enforce password rules, but I didn&#8217;t really like the way any of them work (and my PHP skills aren&#8217;t very strong). So, instead I came up with a way to call an arbitrary, external program to test potential new passwords. Here is the code (to be added to LocalSettings.php):</p>
<pre># Enforce strong passwords when password is changed:
$wgHooks['PrefsPasswordAudit'][] = 'ChkStrongPassword';
function ChkStrongPassword($user, $newPass, $error) {
   $output = array();

   if($error !== 'success') { return true; }
   putenv("CHKPASS=$newPass");
   exec("/home/mywikiuser/bin/chkpass", $output, $rtn);
   putenv("CHKPASS");
   if($rtn) {
      throw new PasswordError(implode($output));
   }

   return true;
}
</pre>
<p>We are calling the program /home/mywikiuser/bin/chkpass to check passwords. This program expects the environment variable CHKPASS to be set to the potential new password. (We pass it in via the environment rather than the command line because other interactive users on the server may be able to see the command-line options of running jobs. I used an environment variable rather than stdin because it appeared easier to do in PHP.)</p>
<p>If the password meets our requirements, the chkpass program should return success (exit status 0) and emit no output. If the password fails to meet requirements, chkpass should return failure (exit status non-0) and emit to stdout a human-readable text description of what is wrong.</p>
<p>Here is a link to my sample implementation: <a href="/source-download/chkpass">chkpass</a> (1397B Perl script)</p>
<h2>Requiring HTTPS</h2>
<p>Even if users have to authenticate using strong passwords that they&#8217;ve committed to memory, that accomplishes little if those passwords are flying over the &#8216;net in cleartext &#8212; or if the content protected by those passwords is sent in the clear.</p>
<p>Fortunately, there is a relatively simple solution: make sure we use HTTPS for every transaction. If you&#8217;re using Apache 2.x, this can be accomplished  by putting something like the following in the .htaccess file for your Wiki:</p>
<pre>
# If HTTPS, require a strong cipher
SSLOptions           +StrictRequire
SSLRequire           %{SSL_CIPHER_USEKEYSIZE} &gt;= 128

# Insist on HTTPS always
RewriteEngine        on
RewriteCond          %{HTTPS} !=on
RewriteRule          .* - [F]
</pre>
<p>With the above in place, any access attempt using plain old HTTP will be met with a &#8220;403 Forbidden&#8221; error. Note that you&#8217;ll need an SSL certificate to use HTTPS, either self-signed or certified by a CA. If the former, users will need to manually establish the chain of trust for your cert the first time they access your site.</p>
]]></content:encoded>
			<wfw:commentRss>http://mythopoeic.org/mediawiki-private/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Why the Password-Protected Posts?</title>
		<link>http://mythopoeic.org/why-password/</link>
		<comments>http://mythopoeic.org/why-password/#comments</comments>
		<pubDate>Thu, 04 Mar 2010 06:21:10 +0000</pubDate>
		<dc:creator>dhenke</dc:creator>
				<category><![CDATA[Administrivia]]></category>
		<category><![CDATA[change is bad]]></category>
		<category><![CDATA[Privacy]]></category>

		<guid isPermaLink="false">http://mythopoeic.org/?p=661</guid>
		<description><![CDATA[Things change, and change is bad. One bad change you may have noticed is that a handful of the posts here are now password-protected. This isn&#8217;t something I&#8217;m especially happy about, since these posts are about things I think are cool, and which I want to share with everyone. But someone smart (and with a [...]]]></description>
			<content:encoded><![CDATA[<p>Things change, and change is bad.</p>
<p>One bad change you may have noticed is that a handful of the posts here are now password-protected. This isn&#8217;t something I&#8217;m especially happy about, since these posts are about things I think are cool, and which I want to share with everyone. But someone smart (and with a significant stake in the matter) made the case that these posts also leaked information which could put me and others at risk.</p>
<p>I&#8217;ve tried to adopt the most measured response that still fixes the problem: hide only the posts I have to, and put those behind a wall rather than destroy them entirely. While I hope this is the last time I&#8217;ll have to do this retroactively, we don&#8217;t always get what we want or expect.</p>
<p>If you see a password-protected article, and have reason to believe I know and trust you, then you can always send email and ask for the password. Unless that trust is already established, though, you&#8217;re wasting your time &#8212; if it were something I could show to just anybody, I&#8217;d already be doing that.</p>
]]></content:encoded>
			<wfw:commentRss>http://mythopoeic.org/why-password/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Backups on the Home Front</title>
		<link>http://mythopoeic.org/backups-on-the-home-front/</link>
		<comments>http://mythopoeic.org/backups-on-the-home-front/#comments</comments>
		<pubDate>Sat, 19 Sep 2009 05:06:42 +0000</pubDate>
		<dc:creator>dhenke</dc:creator>
				<category><![CDATA[Compugeekery]]></category>
		<category><![CDATA[bourne shell]]></category>
		<category><![CDATA[Privacy]]></category>

		<guid isPermaLink="false">http://mythopoeic.org/?p=102</guid>
		<description><![CDATA[In computing, you have exactly two options: 1) Have current, working, tested backups or 2) don&#8217;t care if your data is there tomorrow. There is no third option. Pretending that there is leads only to substantial cussing. Unfortunately, people mostly either know this already, or won&#8217;t be convinced of it until they learn from the [...]]]></description>
			<content:encoded><![CDATA[<p>In computing, you have exactly two options: 1) Have current, working, tested backups or 2) don&#8217;t care if your data is there tomorrow. There is no third option. Pretending that there is leads only to substantial cussing.</p>
<p>Unfortunately, people mostly either know this already, or won&#8217;t be convinced of it until they learn from the school of bitter experience. So, this isn&#8217;t a post to try to convince you to take backups; it&#8217;s a post about how I do it, presented in the hopes that it&#8217;ll make doing so easier and safer. (As with many of my computer-related posts, the actual implementation is somewhat specific to UNIX-like systems, though many of the general principles apply universally.)</p>
<p><span id="more-102"></span></p>
<h2>Goals</h2>
<p>There are a few different requirements that a backup solution needs to meet &#8212; for me, anyway; your requirements may differ. These include:</p>
<ul>
<li><strong>reliability</strong> &#8212; Write-only backups are no fun. Part of reliability is <strong>testability</strong>. You can&#8217;t count a backup solution as reliable unless you can and do periodically check (by actually retrieving files) that it works. If you didn&#8217;t personally get real data out of the backup, it didn&#8217;t happen. Reliability claims of hardware or software vendors aren&#8217;t enough. Success status reported by the backup software is not enough. (Actual example: expensive enterprise-level backup software saying &#8220;complete success&#8221; day after day, when all it was configured to back up was one empty directory.)</li>
<li><strong>redundancy</strong> &#8212; While having a reliable backup system means that there&#8217;s a good chance that you&#8217;ll be able to recover what you need from a single backup disk or tape, redundancy means you have more than one such thing, and that they&#8217;re kept in different places. If all your backups are in the same building, you may be out of luck if that building burns or floods or is raided by thieves. Even backups in different places within the same city or region can all be lost if there is a widespread disaster like a hurricane.</li>
<li><strong>capacity</strong> &#8212; The backup target should be big enough to hold all my important files. For that matter, it should be big enough to hold my unimportant files too, since I don&#8217;t want to have to pick and choose. Now, there are certainly files you can re-generate or re-install from some other source. But if you&#8217;re making decisions about what to omit (to save space), it&#8217;ll inevitably bite you. Example: Why keep all those .mp3 or .flac files? They&#8217;re huge, and you can always re-rip your CD collection. That&#8217;s fine, unless the reason you&#8217;re restoring is the house fire that also destroyed your CDs.</li>
<li><strong>portability &#8212; </strong>Backups are worthless if the rare machine you need to read the media or the special software you need to restore the data were destroyed. Plan around the idea that you&#8217;re in a small town with the backup medium in your hand, a limited budget and a need to restore your data in the next few days &#8212; maybe on a public-access or borrowed computer.</li>
<li><strong>security</strong> &#8212; If your backup media is stolen or seized, the perpetrator should be prevented &#8212; by strong, peer-reviewed encryption &#8211; from gaining access to your files.</li>
<li><strong>ease of use</strong> &#8212; If making backups is a hassle, it&#8217;ll be skipped. If it isn&#8217;t automatic, it&#8217;ll be forgotten. It&#8217;s also important that restoring files be reasonably easy. On the one hand, you shouldn&#8217;t delete or overwrite files you want to keep, and you should always think before typing the command. On the other, mistakes are inevitable, and time you don&#8217;t have to spend recovering from an error is time you can spend on something productive.</li>
<li><strong>ability to keep multiple generations</strong> &#8212; Sometimes a file gets overwritten or corrupted, and stays that way for a while before you notice anything wrong. In such cases, the most recent backup will just be a faithful copy of the bad data. It&#8217;s useful to have the ability to keep multiple generations of backups, spread out over time, and to restore from one of the older ones if you like.</li>
</ul>
<h2>Physical Medium</h2>
<p>I&#8217;ve tried a number of different systems over the years, starting with &#8220;save to two different floppy disks&#8221;, then QIC tapes, various helical-scan systems, and optical media (CD-ROM and DVD-ROM). I&#8217;ve found tapes to be prohibitively expensive, painfully slow and, unless you are absolutely religious about head cleaning and replacing media on schedule, prone to failure. Optical media have limited capacity, and the most reliable such media are write-once.</p>
<p>My current solution involves using plain old ATA (or SATA) hard disks in USB drive enclosures. These are fast, capacious enough to hold multiple snapshots of my entire system, hot-pluggable and more reliable than all but the very best tapes (the drives for which I cannot afford). They&#8217;re self-contained, so I can take them (or mail them) to off-site storage. They&#8217;re cheap and use commodity hardware, so I can have a bunch of backup devices, and buy more if I think I need them.</p>
<h2>Software</h2>
<p>I keep one such drive enclosure connected to my homebox, and use a cron script to run nightly backups. The backup device has a single partition formatted as an encrypted  journaling filesystem (ext3 or reiserfs) &#8212; more details about this in a later section. The backup software is a simple Bourne-shell script I wrote:</p>
<p style="padding-left: 30px"><a href="../source-download/backup">Bourne shell script (3KB)</a></p>
<p>Note that there are configuration options within the body of the script which you must edit; do not just run this without first adapting it to your site.</p>
<p>There are substantial comments within the script itself explaining what it does in detail. In essence, it uses rsync(1) to make a snapshot of a filesystem. It uses the <code>--link-dest=</code> option to opportunistically make hard links for files which have not changed since the previous backup generation.</p>
<p>For example, say you have a 4GB movie file &#8220;cat.avi&#8221;. You make a first backup, and it puts a copy of cat.avi on the backup device. You make a second backup, and cat.avi hasn&#8217;t changed (in terms of contents, name or directory). The second backup will hard-link to cat.avi in the first backup. Now each backup contains cat.avi (with a link count of 2), and it only takes up 4GB of space on the backup (not 8GB).</p>
<p>If you edit the movie (say, by using a compositing program to add captions), and make a third backup on the same device, rsync will notice the change and make a fresh copy (without disturbing the previous two backups).</p>
<p>This combines the advantages of full backups (you need only look in one place to get the latest data, or any specific generation of data) and incremental backups (speed, efficient use of storage space on backup medium).</p>
<p>Because the backup medium is just a plain old mounted filesystem, I can navigate it and use the contents using all the same tools I use for my regular files. I can plug the enclosure into any modern Linux box with a USB port, mount the filesystem, and read my files without having to install anything special. I can do a full restore onto commodity hardware using any one of the various bootable rescue disks. Because not very many files change each night, backups after the first one are extremely fast, and a single drive can hold dozens of generations.</p>
<h2>Encryption</h2>
<p>My backup media are small, portable, and a tempting target for theft or seizure. They are also &#8212; because of the requirement for geographic redundancy &#8212; kept in various places where the physical security is maybe not the best.</p>
<p>As a result, I don&#8217;t want mere possession of one of my backup disks to be sufficient to allow someone to read my files. I also don&#8217;t want to depend on a key or token that might be destroyed or stolen in the same disaster that caused the need to restore from backup in the first place. So, I depend on encryption (using a pass-phrase I know, which is recorded nowhere) to protect the backups.</p>
<p>I don&#8217;t want a filesystem-level encryption solution, since there are (surprisingly common) cases where metadata like the names, sizes and timestamps of files and directories are as privacy-sensitive as the contents of the files; solutions which protect only the file contents are insufficient.</p>
<p>I also consider proprietary closed-source &#8220;encryption&#8221; products to be a contradiction in terms.  You have to take the word of the vendor that the product works, and that they haven&#8217;t included back doors at the behest of some criminal enterprise (governmental or otherwise). This is one of the reasons I disdain &#8220;cloud&#8221; backup schemes: they generally ask you to trust some black-box alleged crypto, sight-unseen. (If you must use such a service, mitigate the risk by storing only big undifferentiated blobs of data you&#8217;ve encrypted yourself, using real open-source peer-reviewed crypto.)</p>
<p>Finally, I want something where the decryption software is ubiquitous, thoroughly tested, actively developed, and easy to come by if I have to do a &#8220;bare metal&#8221; restore.</p>
<p>Fortunately <a href="http://en.wikipedia.org/wiki/Linux_Unified_Key_Setup">LUKS (Linux Unified Key Setup)</a> is an excellent solution to all of the above concerns. Plenty of detail is available on the <a href="http://code.google.com/p/cryptsetup/">project homepage</a>. A short description of the moving parts:</p>
<ul>
<li>block device driver &#8212; abstracts storage hardware (like a USB hard disk) as a standard UNIX block device with a uniform API</li>
<li><a href="http://sources.redhat.com/dm/">device mapper</a> &#8212; a generic framework for mapping one block device onto another</li>
<li><a href="http://www.saout.de/misc/dm-crypt/">dm-crypt</a> &#8212; a device mapper target that provides encryption</li>
<li>LUKS &#8212; provides standardized key management</li>
</ul>
<p>When you plug in your drive enclosure, you get a block device file like /dev/sdb1 for the first (and perhaps only) partition on the disk. That partition (representing what&#8217;s on the physical disk) is full of encrypted hash, statistically indistinguishable from purely random numbers.</p>
<p>Using the device mapper, dm-crypt target and cryptographic functions built in to the kernel (probably via the cryptsetup(8) command), you get another (virtual) block device file like /dev/mapper/backup, which acts as a cleartext version of your physical device. That is to say, when you read from it, a block of encrypted data is read from the physical disk, decrypted, and presented to you. When you write, your cleartext data is encrypted then written to the physical device.</p>
<p>This is handy, because you can use the device mapper virtual block device like any other block device. In particular, you can make a filesystem on it, mount it, and then it is just another directory.</p>
<p>Details depend on your distribution, but the essential steps are running <code>cryptsetup luksFormat</code> to set up the key block, encryption scheme and passphrase, then <code>cryptsetup luksOpen</code> to use your passphrase to unlock a key and create a mapping. A <a href="https://help.ubuntu.com/community/EncryptedFilesystemsOnRemovableStorage">tutorial for Ubuntu 8.04</a> is a representative example.</p>
<p>Some distros (like Ubuntu 9.04) have very nice integration of this process &#8212; all you have to do is plug in a device with an encrypted partition; you&#8217;ll be prompted for the passphrase, and your filesystem will be mounted automatically.</p>
<h2>Data Protection</h2>
<p>Encryption is an essential component of data protection, but isn&#8217;t a complete solution by itself. A complete discussion is beyond the scope of this post, but some things to think about:</p>
<p>If your threat matrix includes scenarios where the attacker has you as well as your backup media, consider using a <strong>duress password</strong> in combination with <strong>decoy content</strong>. This will let you appear to comply with an attacker who threatens you with harm unless you disclose your data, while simultaneously making the data unrecoverable forever (even if you change your mind or the ruse is discovered). The <a href="http://code.google.com/p/beerbottle/">beerbottle</a> project is an example implementation.</p>
<p>Data is only protected if every copy is protected. While backups deserve special consideration because they are have to be in places where physical security is less well-controlled, consider also encrypting your primary filesystem. (This is especially important for laptops, which get travel to all manner of sketchy places.) An unencrypted swap partition may be full of things you&#8217;d rather keep to yourself.</p>
<p>Attackers routinely steal <em>running</em> machines, both host and UPS, dragging the lot back to where memory contents can be dumped and the contents of mounted filesystems examined. A simple limit switch can cut power if the case is lifted; easy to defeat, but effective if the attacker doesn&#8217;t expect it. If you&#8217;re the suspicious sort, there are trembler switches, accelerometers, magnetometers, and really any kind of electronic widget that can detect a condition which is reasonably likely to only arise if your box is being abducted. Creativity counts for a lot; an off-the-shelf solution will have an off-the-shelf countermeasure.</p>
<h2>Controlling Writes</h2>
<p>I keep my backup device mounted most of the time. It&#8217;s nice, because users have direct access (subject to the same file permissions as the original) to all the recent generations of backup. The one disadvantage, and it&#8217;s a big one, is that if a user can write to a file he can destroy the backups of that file. Since part of the purpose of backups is to protect users from their own errors, this is a serious problem.</p>
<p>My solution is to mount the backup filesystem on a mount point that only root can reach, and use that to perform the backups. I have a read-only bind mount accessible to users. A user can read a file if the mode bits allow it, but can&#8217;t write a file no matter what the permissions are. Example commands:</p>
<pre style="padding-left: 30px">mount -t auto /dev/mapper/backup /private-mnt/backup
mount --bind /private-mnt/backup /mnt
mount -o remount,ro /mnt</pre>
<p>In the above example, the directory /private-mnt/ is owned by root and has mode 0700 (read, write, execute only by owner). Non-root users cannot do anything beneath /private-mnt/ (including traverse the backup mount point therein). Anyone can traverse directories, read files and even run programs under /mnt/ (subject to individual file permissions), but can modify nothing (due to the read-only status of the filesystem).</p>
<h2>Scheduling</h2>
<p>Running the backup script can be done through a simple nightly cron job. Mounting the device involves a necessary manual step: entering the pass phrase, which can obviously not be kept in persistent storage. Rotating backup devices between home and remote locations also requires human effort.</p>
<p>I&#8217;m still finding the right compromise. Right now, I run (automated) backups nightly, and rotate the devices every few weeks. Each rotation requires remounting and typing the passphrase. If I have to reboot for some reason (like a lengthy power outage or a kernel update) that also requires a remount &#8212; but such events are few and far between.</p>
<h2>Testing</h2>
<p>If you&#8217;re using cron, it&#8217;s relatively easy to set things up so you get mail if the backup script fails for whatever reason. Actually reading the files is simple enough &#8212; just go into the mounted backup filesystem and do whatever you need to do (either as part of a random audit, or because you made a mess and actually need to).</p>
<p>If you want a more extensive test, it isn&#8217;t difficult to do bitwise comparisons of the backups filesystem (or portions thereof) versus your running system. Just bear in mind that not all differences are errors; some files are supposed to change, after all.</p>
<h2>Problems and Limitations</h2>
<p>The chief problem I&#8217;ve encountered so far is inelegant handling of hard links in the source data. If <em>foo</em> and <em>bar</em> are hard links which refer to the same underlying data, a given generation of my backup will contain two copies of that data. It may be that I can solve this by being smarter about what options I pass to rsync. In some cases, using symlinks instead might be a workaround.</p>
<p>Another potential hassle is automating the process of selectively throwing away older generations of backups. (Actually getting rid of a generation is not a problem &#8212; you just remove the directory tree.) Right now, I deal with this by re-formatting the drives at a certain point in the rotation. It wouldn&#8217;t be a bad idea to write a script that would do something like keep all backups less than a month old, then single generations at two, three, six and twelve months. I just haven&#8217;t done this yet.</p>
<p>While there is a facility for excluding certain data on the running filesystem from the backup (using the rsync &#8211;filter=dir-merge option), it is not exactly straightforward to use. (However, it&#8217;s dead easy to check the next day&#8217;s backups to see if you got the set of files you expected.)</p>
<p>This backup scheme is resilient in the face of single bits of backup media failing (in addition to whatever failure caused you to need to restore from backup in the first place). Beware of monoculture vulnerabilities in your hardware, though. I found a bargain on a loss-leader drive plus enclosure combo at a local shop, and bought several. All had power supplies which failed after a relatively short period of use. Buy enclosures from different vendors, and make sure the drives inside are different makes as well. (There are lots of stories about RAID arrays where all the drives came from the same lot, and all failed within hours of one another. Heed such stories.)</p>
<p>Some USB drive enclosures play silly mapping games, using a portion of your drive for their own nefarious purposes. Avoid these, as it is possible that if you transfer the drive to another enclosure, you&#8217;ll be unable to see your partition table without vigorous hackery. In fact, I&#8217;d suggest trying the drive-swap game right away when you get a new enclosure. Enclosures purchased alone seem less likely to exhibit annoying incompatibilities than those sold bundled with a drive.</p>
<p>This backup scheme is extremely simpleminded, in that it works by just copying all the files one at a time, whenever it is run. This works great for most things, but can be a problem for things that are in the midst of being updated when that copy is made. Relational databases are especially troublesome in this regard. You can work around this by programmatically dumping the database contents to a flat file just prior to running the backup. (Both MySQL and PostgreSQL offer means of doing this without shutting down the database.)</p>
<p>USB drive enclosures, with a few exceptions, do not support the commands needed to access reliability information available on the underlying drive hardware (SMART and the like). While tools like smartctl only give you advance notice of a fraction of drive failures, some warning is better than none. Hotplug ATA or SCSI gets around this problem, but adds others (chiefly expense and difficulty finding compatible hardware on short notice).</p>
<p>This backup scheme makes it much less likely you&#8217;ll lose data by accident, but complicates life if for some reason you need to get rid of all the copies of a given file.</p>
<h2>Conclusion</h2>
<p>I doubt that anyone else will want to use exactly the backup scheme I use, but I hope that some of the ideas here will help some of my readers. I also hope that you&#8217;ll point out my inevitable mistakes, and make suggestions if you have ideas for improvements.</p>
<p>Back up your data. It doesn&#8217;t matter how, so long as it&#8217;s complete, current and actually tested. More of your life than you think is on your computer. Think about someone being interviewed after a house fire &#8212; if all the people and pets are safe, their big worry is usually things like family photo albums. You have files on your computer that are analogous to &#8212; and in some cases, literally are &#8212; those photo albums. And they depend on a complicated, delicate machine that was mass-produced as cheaply as possible.</p>
<p>Protecting your files <a href="http://www.penny-arcade.com/comic/2005/8/10/">requires conscious action on your part</a>. It isn&#8217;t hard, but if you don&#8217;t take it, you&#8217;re living your digital life on borrowed time.</p>
]]></content:encoded>
			<wfw:commentRss>http://mythopoeic.org/backups-on-the-home-front/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Barbarians at the gate: Excluding Bing via robots.txt</title>
		<link>http://mythopoeic.org/barbarians-at-the-gate-excluding-bing-via-robots-txt/</link>
		<comments>http://mythopoeic.org/barbarians-at-the-gate-excluding-bing-via-robots-txt/#comments</comments>
		<pubDate>Sun, 13 Sep 2009 02:24:27 +0000</pubDate>
		<dc:creator>dhenke</dc:creator>
				<category><![CDATA[Compugeekery]]></category>
		<category><![CDATA[Privacy]]></category>

		<guid isPermaLink="false">http://mythopoeic.org/?p=21</guid>
		<description><![CDATA[This isn&#8217;t about the variety of cherry. If you haven&#8217;t heard of Bing.com, it&#8217;s Microsoft&#8217;s recent attempt at a search engine. (If you&#8217;re curious, Google it.) If you are not particularly pleased with the idea of a company like Microsoft making money from your creative work, then my strong suggestion is to create a robots.txt [...]]]></description>
			<content:encoded><![CDATA[<p>This isn&#8217;t about the variety of cherry. If you haven&#8217;t heard of Bing.com, it&#8217;s Microsoft&#8217;s recent attempt at a search engine. (If you&#8217;re curious, Google it.)</p>
<p>If you are not particularly pleased with the idea of a company like Microsoft making money from your creative work, then my strong suggestion is to create a <code>robots.txt</code> file in the root of your web-space, with contents not unlike:</p>
<blockquote><p>User-agent: msnbot<br />
Disallow: /</p>
<p>User-agent: *<br />
Disllaow:</p></blockquote>
<p>The robots.txt is a voluntary standard which allows web page authors to exclude search engines from part or all of their sites. There&#8217;s a <a href="http://www.robotstxt.org/">helpful website</a> that has details about why you might want to do this, and how to go about it.</p>
<p>Of course, there have been some <a href="http://www.seroundtable.com/archives/020728.html">allegations</a> that Bing isn&#8217;t honoring the robots.txt standard. But announcing that they&#8217;re unwelcome is a fine symbolic act, even should they fail to honor your wishes.</p>
]]></content:encoded>
			<wfw:commentRss>http://mythopoeic.org/barbarians-at-the-gate-excluding-bing-via-robots-txt/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

