AWS Private IP’s on Ubuntu 18.04 with Netplan

Ugh. Uggggggggghhhhhhhhhhhhh. Ffffffffffffffffffffffffffffffffffff. Twelve hours to get this damn thing right. OK, deep breath. Use Google, ask sys admin friends, LINT your YAML. Spin up alternate instances because you mis-configured Netplan and it took the server offline. Realize your snapshot was designed too late and you’re totally fucked because of course the dev team doesn’t have backups. Stop what you’re doing because a domain expired on an app that has traction and fuck how did I miss that? Been a ball-busting week. Live and learn. Wear a Thai steel cup next time, discomfort aside, the protection is second to none.

Sigh heavily. Have a Lagavulin (or 3). Smoke half a box of Davidoff cigarillos…get wired for sound…that’s a lot of nicotine for someone who has one cigar a week, two at most. Don’t think about how much was spent between the scotch and cigarillos, or how quickly you consumed either one of them. Deep breath (again). Yoga…just kidding, another scotch and two more cigarillos. Three more and this pack is done…can’t just run down to the nearest convenience store…fuck, the closest tobacconist that will have something worth smoking is in Sacramento and that’s an hour round trip if traffic is light…fuck, it’s Saturday, traffic won’t be light…better manage these last two cigarillos. Hey, here’s an idea, write a blog post, haven’t done that in a while…hmm…but what to write about? Dammit, this server needs to be ready yesterday. System administrators understand the pain but customers don’t, especially when the design team dragged ass and turned in half-assed work. Fuck, cost a design client, but can’t leave them hanging, that’s awful customer service. There’s too little care about customers today as-is, oh no, the customer hurt my feelings or wrote some awful shit about me. Fuck, time to scrub Yelp…I mean, ask a customer politely to change their review. Fuck it, they’ve never been good with password management, won’t remember what they wrote, and don’t have time to go back and look at their past reviews. Sure, some people might, but that customer won’t. Where were we? That’s right, shitty customer service. Won’t provide bad customer service, that’s bad business. ipconfig is deprecated in favor of ip, and that YAML config is a bitch dealing with multiple IP’s…it isn’t supposed to be, but for whatever reason it doesn’t want to take. vi keeps enforcing its own formatting on the YAML, which is pretty awful when you consider the importance of formatting in YAML…like Python. Netplan has extensive documentation, and even some great examples of config files.

Thought about sharing the first iteration of the config file (set routes at your own risk…broke the server twice trying that shit), but that seems pointless. No one wants to see what doesn’t work. You wouldn’t have bothered reading this far if you did. IP’s have been changed to protect the innocent…

They say there’s no atheist in a foxhole, but there’s no atheist when the network connection keeps failing and there’s no backup, either. AWS shows terminal screenshots, if only they had some way to virtually connect locally rather than via SSH… Oh, look, the public IP changed, maybe it will work as soon as I update PuTTY…fuck. Let’s add another private IP and an Elastic IP to that, and maybe that will work…fuck. No connection, the server is fucked. I’m fucked.

Conversation with a friend with AWS Certification. Suggests shutting down the instance, spinning up a new one, and mounting the volume to the new instance. Done this with VM’s dozens of times. Guess sometimes we’re just too close to the problem to see a solution. Fuck, I should probably eat while this volume is detaching from the original and attaching to the new one. Alcohol has calories, that counts, right? Half a bottle. Fuck me. So much for going and grabbing a bite somewhere, not that I really have the time, but shit it would be nice to get out of the house today. Nope, back to the grind…12 hours in and counting. Damn, JoyRun was great, too bad as soon as you move out of a college town it doesn’t work anymore. Maybe Postmates delivers? Ha! Not here. The only pizza place still open is Domino’s…”pizza”

I need to know more local people, convince someone to do a food run for me while I’m grinding through this damn server. OK. Deep breath, the first sign of life. The volume mounted and I was able to edit the YAML file, get rid of the offending code (it was an attempt to separate the addresses into ifconfig-style ethX:format):

# don't do this
# for real
# it's gonna take down your fuckin' server
# some other YAML config
    eth1: # this is fine
        addresses: # still fine
            - 10.0.0.5 # yup, still fine
            - 10.0.0.6 # still
    eth1s0: # woah, nellie
       addresses: # doesn't matter, the previous line already broke shit

This works if the system has an interface eth1s0; if not, the IP configuration gets fucked. Yes, fucked proper. No clue why people keep asking.

Time to knuckle down and learn the ip tool…

ubuntu:~$ sudo ip -c a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 02:07:aa:ff:f9:36 brd ff:ff:ff:ff:ff:ff
    inet 10.0.16.1/20 brd 10.0.31.255 scope global dynamic ens5
       valid_lft 2195sec preferred_lft 2195sec
    inet6 fe80::7:aaff:feff:f936/64 scope link
       valid_lft forever preferred_lft forever
3: ens6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 02:44:ac:46:58:e0 brd ff:ff:ff:ff:ff:ff

Do you see what I see? Further, why did it take me twelve fucking hours to see this???  Chalk it up to the aging brain not learning new things nearly as quickly as it used to. Look… ens5 shows “UP,LOWER_UP” while ens6 does not. The good news is, Ubuntu recognizes the new interface and wants to work with it. Fresh Ubuntu 18.04 install (on what I think is the seventh instance in the last 90 minutes); I had not touched the network configuration on this system…yet. Fortunately, that disconnected volume saved my bacon…keep backups, kids. Take snapshots before you muck with network configurations. I know better and didn’t until it was too late.

No one ever tells you how much time sysadmins spend waiting on shit. Roughly a third of my day has been waiting at this point, maybe more. Word to the wise, netplan reads config from lowest number to highest; if you set a configuration in one of the lower-numbered files, netplan will not override that value with a higher-number file. I set my extra private IP config file to 01 and nothing would override it at that point.

Configuration is in, and I run netplan apply…

Config File:

# /etc/netplan/60-extra-privates.yaml # yup, I'm 12

network:
    version: 2
    renderer: networkd
    ethernets:
        ens6:
            dhcp4: off
            dhcp6: on
            addresses:
                - 10.31.21.174/32
                - 10.31.21.44/32
                - 10.31.21.184/32
                - 10.31.21.6/32
                - 10.31.21.22/32
                - 10.31.21.150/32
                - 10.31.21.212/32
                - 10.31.21.196/32
                - 10.31.21.99/32
                - 10.31.21.157/32
            match:
                macaddress: 02:44:ac:46:58:e0
            set-name: ens6

Rerun ip show…

ubuntu:~$ sudo ip -c a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 02:07:aa:ff:f9:36 brd ff:ff:ff:ff:ff:ff
    inet 10.0.16.1/20 brd 10.0.31.255 scope global dynamic ens5
       valid_lft 3596sec preferred_lft 3596sec
    inet6 fe80::7:aaff:feff:f936/64 scope link
       valid_lft forever preferred_lft forever
4: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 02:44:ac:46:58:e0 brd ff:ff:ff:ff:ff:ff
    inet 10.31.21.174/32 scope global ens6
       valid_lft forever preferred_lft forever
    inet 10.31.21.44/32 scope global ens6
       valid_lft forever preferred_lft forever
    inet 10.31.21.184/32 scope global ens6
       valid_lft forever preferred_lft forever
    inet 10.31.21.6/32 scope global ens6
       valid_lft forever preferred_lft forever
    inet 10.31.21.22/32 scope global ens6
       valid_lft forever preferred_lft forever
    inet 10.31.21.150/32 scope global ens6
       valid_lft forever preferred_lft forever
    inet 10.31.21.212/32 scope global ens6
       valid_lft forever preferred_lft forever
    inet 10.31.21.196/32 scope global ens6
       valid_lft forever preferred_lft forever
    inet 10.31.21.99/32 scope global ens6
       valid_lft forever preferred_lft forever
    inet 10.31.21.157/32 scope global ens6
       valid_lft forever preferred_lft forever
    inet6 fe80::44:acff:fe46:58e0/64 scope link
       valid_lft forever preferred_lft forever

Looks a little better, at least ens6 is UP now…

IP addresses are all showing, that’s a good sign. ifconfig gives a different response, but we don’t have to worry about that anymore…we’re ip users now!

But…(there’s always a but)…going to the public IP attached to that private IP still comes up with nothing. Time to run through the ol’ checklist…

  1. Is NGINX running? If so, better catch it…if not, restart it.
  2. Is PHP ok? Is PHP-FPM running and is it the right version? (7.2 as of this writing)
  3. Is NGINX configured properly?
  4. Is each site configured to the appropriate private IP address?

Well, shit. I’ve never hoped I fucked up an NGINX config file so badly before.

Day two; already two hours in. Strong black coffee and an apple…should have eaten more. Should have added Tullamore Dew to the coffee.

Finally came across Thomas Ward’s answer on AskUbuntu which says sub-interfaces aren’t possible (Netplan supports multiple IP’s on a single interface). Has this all been a snipe hunt? In all fairness, Netplan’s documentation says so, had I only read those examples closer…FML (and also, blocking all connections to the server with my virtual interface pretty much taught me as much…but at least now we have different ways of phrasing the question). According to user slangasek on AskUbuntu, multiple IP’s on a single interface is considered best practice (as compared to virtualizing additional interfaces). Clearly, I’m not a networking geek and don’t keep up (I’m much more Dev than Ops). But hey, what more can we ask than constantly improving and learning new things? Sure, ifconfig no longer ships with the most recent Ubuntu, but I could have rolled back to 16.04 or even installed ifconfig and overrode the configuration, telling Ubuntu to return to the Old Ways. No one likes change, but no wise man denies its inevitability (disclaimer: I’m no wise man). I’m trusting slangasek on this; I lack the time to research networking best practices, and Netplan is favored over ifconfig on Ubuntu 18.04, so good enough for me. Forward we march toward insanity greatness!

Tested switching ens6 to dhcp4 = true…bad idea. Detaching the network interface saved my bacon. Beats spawning yet another instance. FWIW, you might see examples setting dhcp4 to either on/off or true/false…both work. Back to the drawing board. Don’t need bridging, the IP’s attach direct to the interface and already show up with the ip -c a command. Sigh.

While I’m thinking about it, here’s a couple aliases I add to make jumping screens easier (screen is a lifesaver, especially when the server disconnects you):

# ~/.bashrc

# ... lots of stuff here...
# somewhere around line 95...
# screen aliases
alias s='screen -x'
alias ss='screen -S'

Hey, maybe it’s a firewall issue…

ubuntu:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

iptables is clear (makes perfect sense, we rely on the AWS firewall instead). AWS firewall looks good, allows connections on ports 80 and 443 (NGINX handles the redirects from 80 to 443; can’t trust users to automatically type in https yet…if ever).

Two hours later and I realize the Elastic IP’s are bound to the VPC, not the EC2 instance itself (the new way on AWS). FML. Time to figure this shit out…and learn the VPC has no default access to the outside world (which seems odd seeing as I can hit the AWS-set public IP, but maybe that’s bound to the EC2 instance and not the VPC; I just don’t know). Elastic IP’s already set up means I don’t have to worry about assigning public IP’s (seems obvious).

Time to enable Flow Logs via CloudWatch. “Good” news…the VPC blocks the connections. OK, time to look at the ACL…everything open on 0.0.0.0\0…sigh. Time to look at the Internet Gateway again.

Checklist time…

  1. Does the VPC have an Internet Gateway? Yuuup.
  2. Does the subnet route table point to the Internet Gateway? Yuuuuuup.
  3. Do the instances in the subnet have globally unique IP addresses? Yuuuuuuuuup.
  4. Do the network access control and security group rules allow relevant traffic to flow to and from the instance? It sure looks like it, but CloudWatch tells a different story…
  5. If I visit one of the Elastic IP’s connected to the interface from my browser, does it work? NOPE. FML.
  6. Maybe it’s just me? Not according to “Down for Everyone or Just Me?”

So. Reviewing CloudWatch shows a request to the Elastic IP pointed to the Private IP pointed to the correct EC2 instance was Rejected. OK, back to the old drawing board, time to figure out where the rejection occurs (and why). For a second there, I thought I was losing my mind and couldn’t read the log, but it turns out I was correct. I’d like to say I’m only wrong when I think I’m wrong, but this post proves otherwise.

OK, the logs show an ACCEPT on 80 and 22, at least for the Network Interface in the current subnet. So the Elastic IP’s pass to the Private IP’s no problem. Good to know. If you ever need a reminder that your servers are under constant attack from everywhere, record REJECT VPC Flow Logs to CloudWatch.

Maybe that’s a breakthrough… 

When looking at the NGINX default configuration page, the page loads just fine. Every Elastic IP on that network interface loads just fine…with the default page. It looks like NGINX doesn’t recognize the local IP…two steps forward and 47 back. Can’t find anything on Google about NGINX seeing (or not seeing) IP’s defined in Netplan. Almost every example shows a config that doesn’t specify an IP. Back to conversing with my AWS Certified friend…suggests I bind the domain to the directory (the server_name directive, which I did) and don’t worry about defining the IP on the NGINX config. Sometimes we’re too close to the problem to see a clear solution.

Well. NGINX is ignoring every private IP I’ve got except for the primary one (they all come up when running ip -c a, so I know Ubuntu sees them).

The most frustrating thing…none of this is uncommon and shouldn’t be difficult. AWS supports multiple IP’s on a Network Interface for easy swapping when EC2 instances have problems.

You came here to understand how to assign multiple private IP’s to a single AWS instance using Netplan, and we all got derailed. However, I think it’s safe to say I made a bunch of mistakes so you don’t have to.

If you’re anything like me, you were really hoping for a happy ending to this story. There is one, in the sense that the server is finally back up and churning out happy little 1’s and 0’s. However, I “fixed” it by rolling back to 16.04 and using ifconfig. I’m unsure NGINX has any way to work with the setup I had. I’ve talked with experts, done countless Google searches, and finally said fuck it…shit has to work and it has to work on a deadline. In a perfect world, I’d have had the time to delve into the NGINX and Netplan documentation headfirst and resolve the issue (assuming it can be resolved within the limitations of both). I might have used Apache instead, or done any number of things.

Unfortunately, that’s not how it went. But from a delivery perspective, the customer is happy with his robust new server and I no longer have to lose a moment of sleep on this roller coaster I spent the last two days on.

Happy travels!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.