openswan ipsec in ec2 4

Posted by peter on August 24, 2011

This may be totally invalid given amazon rolling out cross-region VPC a few weeks ago, but for those who still insist on rolling their own…

I was dealing with setting up ipsec (openswan) in EC2 for some folk which included, among other things, cross region EC2-instance-to-EC2-instance links. We had endless trouble with connections just suddenly dying. UDP isn’t the easiest thing to get right with NAT, and though it’s hard to be conclusive (especially when debugging linux ipsec- not the easiest thing to follow in and out of the kernel), I point my blame-finger at trouble caused by bad interactions with double-NAT between EC2 regions.

Problem was eventually solved with a combination of aggressive dead peer detection settings (dpddelay=4 dpdtimeout=16) and (the trickier setting to find) by adding disable_port_floating=yes to the config setup region of ipsec.conf. That setting stops pluto from changing what port it communicates on, which, I assume, makes an easier job for Amazon’s NAT. This also means NAT-T behavior is probably not going to work with other vendors’ implementations in this setup, as pluto doesn’t listen on 4500 anymore, but we’re openswan everywhere, and it’s made our links stable.

how to un-shoot your foot 9

Posted by peter on September 29, 2008

Recently I shot myself in the foot pretty bad. We have a ~12TB data array that was set up as a raw LVM2 device- no partition table. There were some issues with one of the cluster members, so I went to rescue boot to attempt to correct it. Redhat’s rescue boot, in the name of spreading democracy, proactively offers to put a partition table on anything that doesn’t have one already, and I fat-fingered the key that made this happen right over the top of our array. Whoops.

This isn’t impossible to recover from- the only thing that got overwritten was, for the most part, LVM data, and it so happens that the boundaries of the volumes is more or less known (25% each), so recovery should be possible just by re-writing correct (or near-correct) headers setting the boundaries in the right spot. Problem with this approach is that you get one shot, and it’s not like we have another 12TB sitting around that we can copy the data to. Here’s where dm-mapper came in to save my ass.
Continue reading…