Saturday, September 23, 2017

Port Priority based Root Port Election in STP

This is a concept which is misunderstood by many people. It is really because of the topologies they use to learn this concept.

When the switch has two interfaces connecting to the same switch, and the cost to reach the root bridge is the same it will use the interface with the lowest number as the root port..

For an example, let's say SW-A has the lowest MAC address hence SW-A will be the Root..

What will be the Alternate Port here?
Is it the e0/1 of SW-B?
Yes

By just looking at the port numbers, we can say that..
What will be the Alternate Port here?
Is it the e0/3 of SW-B?
No
E0/1 of SW-B will be the Alternate Port..




This is where most people will answer incorrectly.. The port priority which is considered is not the port priority of the of the SW-B. It is the port priority of the BPDU sender. Which in this case is the Root Bridge (SW-A).. Sender's port priority is what matters..

If we look at the output of show spanning-tree; (I use actual hardware from here)
































Because the priority is 128 default which is equal in all ports it has boiled down to the port number.. Lower port number will be the root port..

Let's change the port priority manually on SW-A..

SW-A(config-if)#int fa0/2
SW-A(config-if)#spanning-tree port-priority 16

































Now you can see the Alternate Port is changed to Fa0/3 of SW-B..

Additional Note:-
Decision making process of STP is like the following..

(1) Lowest bridge ID: the switch with the lowest bridge ID becomes the root bridge.
(2) Lowest path cost to Root Bridge: when the switch receives multiple BPDUs it will select the interface that has the lowest cost to reach the root bridge as the root port.
(3) Lowest sender bridge ID: when a switch is connected to two switches that it can use to reach the root bridge and the cost to reach the root bridge is the same, it will select the interface connecting to the switch with the lowest bridge ID as the root port.
(4) Lowest sender port ID: when the switch has two interfaces connecting to the same switch, and the cost to reach the root bridge is the same it will use the interface with the lowest number as the root port.

Tale of 2 BPDUs & Their Propagation in Legacy STP

There are 2 types of Bridge Protocol Data Units (BPDUs)

(1) Configuration BPDUs
(2) Topology Change Notification BPDUs (TCNs)

Using these 2 types of frames, STP does it's everything..

A BPDU frame has the following format..






















Let's see a normal operation..

Configuration BPDUs are generated from the Root Bridge (Root Switch) and flow outward along the active paths and move away from the Root Bridge. Which means Non-root switches will receive these from their Root ports & Block ports and forwards out only from their Designated ports..

Each designated switch changes the Root Path Cost, Bridge ID & Port ID of the BPDU they receive before they send them downstream..

Following image shows the propagation of the Configuration BPDUs in a switching network..





















Let's see what happens when a link goes down..

For example, let's assume a link on S-5 (not the link connected to S-3) goes down.
Here is where the TCNs (PBDUs which can go upstream) were introduced.

TCNs are generated normally from Non-root switches and flow upstream towards the Root Bridge to inform the Root Bridge that the network topology has changed. Which means Non-root switches will forward out these only from their Root ports and receive only from their Designated ports..

The TCN is a very simple BPDU that contains absolutely no information that a bridge sends out every Hello Time seconds (this is locally configured Hello Time, not the Hello Time specified in configuration BPDUs which were set by the Root Bridge).

The designated bridge acknowledges the TCN by immediately sending back a normal configuration BPDU with the topology change acknowledgement (TCA) bit in Flag field set. The bridge that notifies the topology change does not stop sending its TCN until the designated bridge has acknowledged it.

Note that TCA is not a new BPDU.. It's just a Configuration BPDU with a bit changed..

Propagation of TCNs are indicated in red and the propagation of TCAs are indicated in green..





















Once the TCNs hit the Root, it also acknowledges the designated switch and it starts to send out its Configuration BPDUs with the topology change (TC) bit set.

These BPDUs are relayed by every bridge in the network with this bit set. As a result all bridges become aware of the topology change situation and it can reduce its CAM table Aging Time to Forward Delay.

Bridges receive TC BPDUs on both forwarding and blocking ports because they are actually Configuration BPDUs.

TC BPDUs are propagated like the following way.. They are also not a new BPDU type. Just the Configuration BPDUs with a bit change.





















These TC BPDUs are sent for a period of Max Age + Forward Delay seconds, which is 20+15=35 seconds by default. Here is why..

We deal with 4 timers in STP..

(1) Hello Time
(2) Max Age
(3) Forward Delay
(4) Aging Time

You can see those 4 timers in  the show spanning-tree output..

















Hello Time is the time frequency which the Configuration BPDUs are sent.. Default is 2s.
Max Age is the time which a BPDU is considered valid by a switch.. Default is 20s.
Forward Delay is the time which it takes to move from Listening State to Learning State and Learning State to Forwarding State.. Default is 15s.
Aging Time is the time which a CAM table entry will be valid.. Switches flush the MAC Address entry from it's CAM table after this time expires.. Default is 300s.

The reason for the TC BPDUs are sent for a period of Max Age + Forward Delay is that it will force the switch to flush the old remembered BPDUs and clear the MAC address table.

You can see the CAM Aging Time changes to Forward Delay on show spanning-tree output by entering the following debug command and unplugging a port of a switch..
S-1-ROOT#debug spanning-tree events




















Note:- You will not see the receipt of TC BPDUs on debug, but all the switches in the network will receive them silently and change their timers and relearn the MAC addresses..

We found 4 BPDUs in the above STP process.

(1) Config BPDU
(2) TCN
(3) TCA
(4) TC

But there are actually only 2 types which are Config BPDUs & TCNs. TCAs & TCs are just Config BPDUs with only a bit changed in the Flag field of the frame..

Wednesday, September 20, 2017

Concept of Transparent ASA

Default mode of operation for Cisco ASA is the Routed Mode. It can also operate in a mode called "Transparent Mode" which allows ASA to monitor traffic while forwarding in Layer 2 domain.

IP address of the PC: 192.168.10.10
Gateway (e0/0 of R): 192.168.10.1

We are going to put an ASA in between PC and Gateway, which will not change any configuration in Gateway.. This is the use case of Transparent ASA. It will not be discovered by other devices in the network. 

Let's see how it is configured..

Remember to backup your configurations before changing the ASA mode from Router to Transparent as it will clear all the current configuration.

Enter the following command to change the firewall mode to Transparent.
ciscoasa(config)#firewall transparent

Create a BVI, to know more about BVIs please go here.
ciscoasa(config)#interface BVI 1
ciscoasa(config-if)#ip address 192.168.1.150 255.255.255.0

Configure inside interface grouping to the BVI,
ciscoasa(config)#int gig 1
ciscoasa(config-if)#security-level 100
ciscoasa(config-if)#nameif INSIDE
ciscoasa(config-if)#bridge-group 1
ciscoasa(config-if)#no shut

Configure outside interface grouping to the BVI,
ciscoasa(config)#int gig 0
ciscoasa(config-if)#security-level 0
ciscoasa(config-if)#nameif OUTSIDE
ciscoasa(config-if)#bridge-group 1
ciscoasa(config-if)#no shut

As soon as you enter the above commands, it will be displayed in the int ip brief output like the following..










Now the configuration is over. PC will forward the general traffic to internet via the gateway without knowing there is an ASA in between. But the icmp pings will not work until you configure your ASA to inspect them.. You can do it either by ASDM or CLI..
Following are the commands to do it in CLI,
ciscoasa(config)#policy-map global_policy
ciscoasa(config-pmap)#class inspection_default
ciscoasa(config-pmap-c)#inspect icmp

Note:- If there are switches in both sides of ASA, you will have to use Ethertype ACLs to allow BPDUs. By default ASA will not forward BPDUs..
If there are routers in both sides which uses a routing protocol like OSPF, you will have to allow multicast traffic in order to make adjacencies.
If you configured a DHCP in Gateway and PC is a DHCP client, you will have to do additional configuration in ASA to allow broadcast traffic.
You will not be able to terminate VPNs in this mode of ASA because the interfaces work as L2 interfaces..

You can view the current mode of ASA by the following command..
ciscoasa#show firewall

You can change the ASA  back to routed mode by the following command..
ciscoasa(config)#no firewall transparent

Tuesday, September 19, 2017

Protecting BGP Neighbor Relationships

Authenticating using Passwords

This is the simplest solution for both eBGP & iBGP neighbors..
Following is an example for eBGP neighbors.












R1(config-router)#router bgp 1
R1(config-router)#neighbor 10.0.0.2 remote-as 2
R1(config-router)#neighbor 10.0.0.2 password cisco

R2(config-router)#router bgp 2
R2(config-router)#neighbor 10.0.0.2 remote-as 1
R2(config-router)#neighbor 10.0.0.1 password cisco

You can give encryption levels from 1-7 or MD5 hash as per the corporate security policy..
But still the routers are vulnerable to CPU DoS attacks as they check each and every malformed packet attacker sends..

Protecting eBGP Neighbors by Changing TTL

Routers send BGP packets to eBGP neighbors with a TTL of 1 by default which implies they should be connected directly. This is the security mechanism of eBGP. But an attacker can spoof this TTL value easily by fixing the TTL value to be 1 when the packets reach the destination router from a remote location.











In the above example, if the attacker set the TTL to 3, it will appear at R1 as TTL of 1 which means R1 will think the attacker is directly connected..

Additional TTL security command is like the following..

R1(config-router)#neighbor 10.0.0.2 ttl-security hops 1
R2(config-router)#neighbor 10.0.0.1 ttl-security hops 1

This will change the TTL of eBGP packets to 255. Both the neighbors will only accept the packets if the TTL is 255 only. Which means only directly connected routers will be able to try a peering..

Note:- ebgp-multihop is not a security command, it will only change the TTL to the given number which will only allow eBGP peers to accept packets which are lower or equal to that given value.
You can learn more about this command here. Still vulnerable to the above attack. ttl-security hops command is actually the reverse logic of ebgp-multihop command hence you cannot use both commands together..

Monday, September 18, 2017

BGP Finite State Machine

BGP FSM (Finite State Machine) has 6 states to help in troubleshooting..

In summary, once you configured the neighbor IP address, router will try to reach that neighbor IP address on destination TCP port 179 using his routing table.

When TCP 3-way handshake completes, router will send an BGP Open message. (This message is similar to to the hello packet that EIGRP & OSPF use)

When the Open message has been sent and received and all other parameters match (like authentication) then the neighbors will reach the established state..


All the states are described in detail in the following..





Idle
The this state is the initial BGP state. In Idle state, the router refuses all connection requests from neighbors.

The router statrs a TCP connection with its BGP peer and goes to Connect State only after receiving a Start event from the system.

The Start event occurs when an operator configures a BGP process or resets an existing BGP process or when the router software resets a BGP process.

Connect
In this state, router starts the ConnectRetry timer and waits to establish a TCP connection.

If the TCP connection is established, the router sends an Open message to the peer and goes to the OpenSent state.

If the TCP connection fails to be established, the router moves to the Active state.

Active
In this state, the router keeps trying to establish a TCP connection with the peer.

If the TCP connection is established, the router sends an Open message to the peer, closes the ConnectRetry timer, and changes to the OpenSent state.

If the TCP connection fails to be established, the router stays in the Active state.

If the router does not receive a response from the peer before the ConnectRetry timer expires, the BGP device returns to the Connect state.

OpenSent
In this state, the router waits an Open message from the peer and then checks the validity of the received Open message, including the AS number, version, and authentication password.

If the received Open message is valid, the router sends a Keepalive message and changes to the OpenConfirm state.

If the received Open message is invalid, the router sends a Notification message to the peer and returns to the Idle state.

OpenConfirm
In this state, the router waits for a Keepalive message or Notification message from the peer.

If the router receives a Keepalive message, it goes to the Established state. If it receives a Notification message, it returns to the Idle state.

Established
In this state, the router exchanges Update, Keepalive, Route-refresh, and Notification messages with the peer.

If the router receives a valid Update or Keepalive message, it considers that the peer is working properly and maintains the BGP connection with the peer.

If the router receives an ivalid Update or Keepalive message, it sends a Notification message to the peer and returns to the Idle state.

If the router receives a Route-refresh message, it does not change its status.

If the router receives a Notification message, it returns to the Idle state.

Also If the router receives a TCP connection termination notification, it terminates the TCP connection with the peer and returns to the Idle state.


Note:- BGP message types (blue colored text) involved are;
1) Open Message (AS number, version, authentication password)
2) Keep Alive
3) Notification
4) Update
5) Router-refresh

image sources: http://support.huawei.com

Thursday, September 7, 2017

How to Read NAT Rules on ASA 8.2 and Older

If you are upgrading your ASA from version 8.2 or older to newer codes, you may have to worry about the NAT rules. The reason is that the older versions of ASA were depending on NAT rules to forward traffic. The newer don't.. So you will have to understand how to read those rules and understand what they were doing before migrating to the new ASA...

Let's see how the different types of NAT are configured in CLI in older ASAs..

Dynamic NAT & PAT

nat (inside) 1 10.0.0.0 255.255.255.0
global (outside) 1 192.168.1.10-192.168.1.100
global (outside) 1 192.168.1.101

1st statement is the match statement for incoming traffic.. It says that "If the source is coming from INSIDE interface & if the source IP is in the 10.0.0.0/24 range, put it into the NAT group 1"

2nd and 3rd statements here are action statements for the outgoing traffic..
2nd statement says the NAT group 1 should be translated to the pool starting from 192.168.1.10 and ends from 192.168.1.100 when it is going to OUTSIDE interface..
3rd statement says the NAT group 1 should be translated (PAT/NAT overload) to 192.168.1.101 when it is going to OUTSIDE interface..
This 3rd rule applies to the traffic after the dynamic pool is exhausted because the command is entered after the dynamic NAT statement (2nd)..
For the 3rd statement, you can give an interface instead of an IP address too..

Static NAT

static (dmz,outside) 192.168.1.175 172.16.0.5
static (dmz,inside) 172.16.0.5 172.16.0.5

Above are 2 static NAT rules..

1st one says when the traffic is moving between DMZ interface (source) and OUTSIDE interface (destination), the source IP 172.16.0.5 should be translated to 192.168.1.175..

2nd line translates to the same IP, which is called "Identity NAT"


NAT 0 Policy

In older ASAs, NAT was a mandatory feature. Which means there is an implicit NAT rule which NATs all the traffic which has not a specific NAT rule configured. (This is somewhat like the implicit deny rule at the end of ACLs) In some versions you can disable it using "no-nat control" command. Anyhow, now it is no longer there after 8.4 version..

In older versions, if you are not disabling it, you have to disable NAT for the specific traffic you don't want NAT to happen. As an example you will need to turn off NAT for the IPs of hosts which you configure IPSec site to site VPNs..
This can be achieved by using an ACL with nat 0 policy..

access-list NONAT extended permit ip any 57.234.195.128 255.255.255.192
nat (inside) 0 access-list NONAT

1st line is just an ACL which identifies the traffic.
2nd line says if the traffic match the NONAT range coming from INSIDE source, put it in the NAT group 0, which does not do NAT..

This traffic will be added to the NAT rules section in the ASDM as "Exempt"s, which means it exempt this traffic from being NATing by the implicit NAT rule which NATs all the traffic which has not a specific NAT rule configured.. You can see the above NAT rule at the 23rd line.


Monday, September 4, 2017

Port Address Translation (NAT Overload / Masquerading)

Flavor of dynamic NAT that maps multiple private IP addresses to a single public IP address using different ports..






















Let's configure PAT on R1;

Define inside & outside..

R1(config)#int e0/0
R1(config-if)#ip nat outside

R1(config)#int e0/1
R1(config-if)#ip nat inside

Create a pool for private IP range..
R1(config)#access-list 10 permit 192.168.1.0 0.0.0.255

Do the mapping..
R1(config)#ip nat inside source list 10 interface e0/0 overload

As soon as you enter the above commands, you will not see anything on nat translations & routing table like in static NAT.. But when the traffic is generated, they will start to populate..

When PC-1 is pining the server 203.115.41.221 & PC-2 is pining the server 203.115.41.221; following will be the output..






Inside local address – The private IP address assigned to a host in the inside network.
Inside global address – The public IP address which represents a host in the inside network.
Outside local address – The public IP address of a host in the outside network as it is seen to the hosts in the inside network.
Outside global address – The public IP address which represents a host in the outside network.

Above terms are local to the router.. Inside and Outside terms are adapted from the router's interface definitions (inside nat interface & outside nat interface.

You will not see a new entry for the public IP address in the routing table too to the outside interface like in Static NAT or Dynamic NAT..

Dynamic Network Address Translation

Maps a local address with a pool global addresses..
Need to have one real public IP address for every private IP address..
Cannot permanently bind a public IP address with host like in static NAT..
When the pool is exhausted, router discards the translation..






















Let's configure dynamic NAT on R1..

Define inside & outside..

R1(config)#int e0/0
R1(config-if)#ip nat outside

R1(config)#int e0/1
R1(config-if)#ip nat inside

Create a pool for private IP range..
R1(config)#access-list 10 permit 192.168.1.0 0.0.0.255

Create a pool for public IP range..
R1(config)#ip nat pool DYNAMIC 203.115.41.110 203.115.41.120 netmask 255.255.255.0

Do the mapping..
R1(config)#ip nat inside source list 10 pool DYNAMIC

As soon as you enter the above commands, you will not see anything on nat translations & routing table like in static NAT.. But when the traffic is generated, they will start to populate..

When PC-1 is pining the server 203.115.41.221; following will be the output.






Inside local address – The private IP address assigned to a host in the inside network.
Inside global address – The public IP address which represents a host in the inside network.
Outside local address – The public IP address of a host in the outside network as it is seen to the hosts in the inside network.
Outside global address – The public IP address which represents a host in the outside network.

Above terms are local to the router.. Inside and Outside terms are adapted from the router's interface definitions (inside nat interface & outside nat interface.

You will see a new entry for the public IP address in the routing table too to the outside interface..
Note that it will clear this entry when you clear ip nat translations..


Sunday, September 3, 2017

Static Network Address Translation

One-to-one mapping between local and global addresses..
Need to have one real public IP address for every private IP address..
Used with servers mostly..






















Let's configure static NAT on R2 where the servers are..

Define inside & outside..

R2(config)#int e0/0
R2(config-if)#ip nat outside

R2(config)#int e0/1
R2(config-if)#ip nat inside

Do the mapping..
R2(config)#ip nat inside source static 192.168.2.21 203.115.41.221
R2(config)#ip nat inside source static 192.168.2.22 203.115.41.222

As soon as you enter the above commands, you will see the following output on nat translations..






If you look into the routing table, you will see the public IP address are taken into the routing table like the following.. It will not go away even though you cleared the nat translations..



















When the servers are generating traffic destined to outside of their network (ex:- pinging to 203.115.41.111 which is actually the PC1 from 203.115.41.221), you will see the following output..







But when an outside host try to reach the servers, you will see something like the following..
(pinging from 203.115.41.111 which is actually the PC1 to 203.115.41.221)







As you can see, both the outputs are same..

Inside local address – The private IP address assigned to a host in the inside network.
Inside global address – The public IP address which represents a host in the inside network.
Outside local address – The public IP address of a host in the outside network as it is seen to the hosts in the inside network.
Outside global address – The public IP address which represents a host in the outside network.

Above terms are local to the router.. Inside and Outside terms are adapted from the router's interface definitions (inside nat interface & outside nat interface.

Saturday, September 2, 2017

Installing Active Directory, DNS and DHCP on Windows Server 2012 R2

As a network engineer you may have to have some idea of these basic services running in enterprise environments. If you want to install Windows Server 2012 with a basic understanding about the common terms you may need to go through following posts..


If you haven't changed the server name after installation, go to Server Manager > Local Server 


Click on the Computer Name and give a name of your choice & restart the server..






Before installing the services like DHCP & DNS, you will have to assign an IP address to the network interface like the way you do in your Windows PC.

To install Active Directory, DNS and DHCP; click on the Manage > Add roles & features on the Server Manager dashboard.

It will prompt "Add Roles & Features" wizard. Basically you will need only to hit Next until where you will asked to select Server Roles..


Select the roles and hit Next all the way to Install. 

When adding roles, it will ask about the features, mostly you will have to continue with Next..

































After the installation process completes, you will need to do 2 things which are marked in blue color in the results page. Click on Promote this server to a domain controller..
Because this is a clean installation (no domain nor forest), I am selecting Add a new forest & giving Root domain anme as roshanznet.local














Give the DSRM password on the next page and click Next..
For the next pages, you will mostly hit Next until you find the page to Install..

After the reboot you will a yellow flag icon on the Server Manager dash board asking to complete DHCP configuration. Mostly for a basic setup it will just be few Next Nexts..

Implementing ITIL v3 Framework in a Small Network Operations Center

Adapting a best practice framework is crucial to improve the quality of any IT service delivered to a customer. There are some well known frameworks designed to meet business requirements.

You can use one of them which suits for your organization and do some adjustments / customization if needed. 


The framework we chose to implement is ITIL (Information Technology Infrastructure Library) which was developed in United Kingdom around late 90's and currently at its version 3 which is used by many companies around the world. I have done some customization to this version of ITIL to match the NOC I work and currently we are adapting to this new framework.. If you are working in a small NOC too, you will be able to implement ITIL in your work place after reading this.. 

The NOC I work is a small technical team which consists about 15 engineers including the Team Leader. We have the Help Desk function & the L1 / L2 support functions. We give onsite support as a 3rd party contractor to the national airline at an international airport. What we do here mostly resides in 2 stages (Service Transition & Service Operation) out of 5 stages of ITIL. These stages group processes which we should follow..

Five Stages of ITIL are as following;

(1) Service Strategy
(2) Service Design
(3) Service Transition
(4) Service Operation
(5) Continuous Service Improvement

Service Transition is the implementation stage while Service Operation is the monitoring & support stage..  First let's look at the original framework..
Service Transition & Operation stages are like the following.. (click on the image to view in full size)































The objective of ITIL Service Transition is to build and deploy IT services. The Service Transition lifecycle stage also makes sure that changes to services and service management processes are carried out in a coordinated way.






















The objective of ITIL Service Operation is to make sure that IT services are delivered effectively and efficiently. The Service Operation lifecycle stage includes the fulfilling of user requests, resolving service failures, fixing problems, as well as carrying out routine operational tasks.

Because Request Fulfillment is a process handled by Service Desk in our environment, we could neglect it. However the most important aspect of ITIL is the idea of responsibilities assigned to individuals. Every process needs to be assigned a process owner to ensure that the process activities are carried out smoothly. Many can be assigned responsibilities but only one should be assigned accountability in any process.

Steps of Implementing ITIL?

(1) Study the work currently doing by the employees and identify the current procedures.
(2) Study the ITIL framework. Here is a good resource. Click here
(3) Decide the ITIL stages which the organization / team operate.
(4) Define the processes with necessary adjustments.
(5) Assign the manager roles to selected employees.

Because we are a small team, I merged some roles in Service Transition stage with some roles in Service Operation stage; so that they can cover more work while operating in both the stages. 

So I created 8 designations (manager roles) who are accountable in carrying out  the above processes.

Change Manager
Operational Stage: Service Transition
Accountable Processes: Change Management, Change Evaluation
Databases Maintained: n/a
Responsibilities:  This guy is accountable for the changes doing to the network.

 - Creates RFCs/CRs
 - Plan maintenance windows
 - Create change schedules
 - Communicate with CAB (Change Advisory Board)
 - Categorize changes (Standard, Normal, Emergency)
 - Create emergency change plans


IT Operations & Release Manager (Team Lead)
Operational Stage: Service Transition, Service Operation
Accountable Processes: Transition Planning & Support, Release & Deployment Management, IT Operations Management, Access Management
Databases Maintained: KEDB
Responsibilities: This guy is accountable for daily work & new implementations.

 - Coordinate all the other processes as central communication point between other managers
 - Choose the technical team and lead them in implementation
 - Update KEDB after a new implementation
 - Maintain access rights
 - Create & maintain the RACI Matrix
 - Create roster


Test Manager
Operational Stage: Service Transition
Accountable Processes: Service Validation & Testing
Databases Maintained: n/a
Responsibilities: This guy is responsible for the resiliency of the network.

 - Prepare test cases
 - Perform tests
 - Produce test reports
 - Prepare user acceptance


Configuration Manager
Operational Stage: Service Transition
Accountable Processes: Service Asset and Configuration Management
Databases Maintained: IMDB, CMDB
Responsibilities: This guy is accountable for everything about network devices.

 - Keep the inventory (IMDB) up to date
 - Deal with RMAs
 - Carryout Audits
 - Managing software & hardware licenses
 - Backup configurations
 - Maintain network diagrams


Knowledge & Technical Manager
Operational Stage: Service Transition, Service Operation
Accountable Processes: Knowledge Management, Technical Management
Databases Maintained: KEDB
Responsibilities: This guy is accountable for the knowledge sharing.

 - Educate other team members with technical knowledge
 - Plan & implement best practices
 - Design SKMS
 - Update KEDB with how to do notes
 - Give technical solutions for daily technical matters
 - Verify all technical documents coming through all other processes


Event Manager
Operational Stage: Service Operation
Accountable Processes: Event Management
Databases Maintained: AEDB
Responsibilities: This guy is accountable for the proactive monitoring of the network.

 - Maintain monitoring tools
 - Log alerts & events in AEDB
 - Inform about the issues to attend


Incident Manager
Operational Stage: Service Operation
Accountable Processes: Incident Management
Databases Maintained: IRDB
Responsibilities: This person is accountable for all the incidents assigned.

 - Log issues in IRDB
 - Deal with Service Desk
 - Handle customer employees
 - Carryout normal changes to the network
 - Escalate issues to relevant parties


Problem Manager
Operational Stage: Service Operation
Accountable Processes: Problem Management
Databases Maintained: KEDB
Responsibilities: This guy is accountable for problem identification & mitigation.

 - Analyze and diagnose the problems (recurring issues)
 - Execute root cause analysis
 - Update KEDB with work arounds to problems
 - Coordinate with service providers
 - Follow up L2 support for problems
 - Follow up L3 support for problems
 - Follow up the life cycle of the problems
 - Create & maintain the Escalation Matrix 


Those above roles are all the IT service management roles we have.
Database Components in SKMS (Service Knowledge Management System) are as following..



AEDB - Alerts & Events Database

IRDB - Incident Records Database

IMDB - Inventory Management Database

CMDB - Configuration Management Database

KEDB - Known Errors Database



Communication & Escalation Protocol?

All the written communication will be carried out via emails. Every manager will send mails directly to the IT Operations & Release manager + all the managers who are directly responsible for the matter for every issue raised through their processes. Additionally all the team members (not only managers) should be copied in the mailing list. Because we only have about 15 members in total, it is OK to put every one in the list.

RACI Matrix

This document is created by IT Operations & Release manager defining groups and roles that are responsible for performing a defined activity.

Here is an example matrix format..

R for Responsible: 
These are the people who is executing the work.
A for Accountable: 
This is the person that at the end is in charge for the results / outcome, usually is an executive.
C for Consult: These are the people in the related fields that we should keep a two-way communication to consult for problem solving and improvement.
I for Informed: These are the people that should receive one-way communication.
(ex:- a report)



Escalation Matrix

This document is created by Problem Manager to define when and how to escalate problems beyond the Incident Manger's normal changes. Escalation procedure will be done by Incident Manager and will be followed up by Problem Manager.

Framework Customization Summary:

(1) Newly introduced 'Event Manager' role is dedicated for proactive monitoring.
(2) IT Operations Management and Release & Deployment Management processes are combined to create one role 'IT Operations & Release Manager' and added the accountabilities  of Transition Planning & Support and Access Management processes.
(3) Knowledge Management & Technical Management processes are combined to create one role 'Knowledge & Technical Manager'.
(4) New database 'IMDB' is created for inventory, spares, RMA & licenses management.
(5) Task of maintaining backups and creating backup plans are removed from IT Operations and added to the Configuration Manager.

Friday, September 1, 2017

Change Static Routes using IP SLA & Track Objects

IP SLA (Service Level Agreement) allows us to generate traffic which can be used to check delay/latency, jitter etc. When it is used with object tracking, we can check the reachability of an IP address (by pinging) or a certain service by connecting to it (using TCP).
If the IP address/service is unreachable we can apply a certain action to happen..

This note explains how to configure IP SLA with track objects to change a route..





















Let's assume our router is R1. We have 2 internet links from 2 service providers..
For this lab, let's assume that the circuit 1 is from R1 to R4 & the circuit 2 is from R1 to R5..

Requirement:-
(1) We want to route all traffic to internet via ISP-1 as the primary path.
(2) If ISP-1 is unable to give a circuit which has a RTT of 100 ms, change the path to ISP-2.

Assuming ISP routing & other basic configurations work well;

IP SLA configuration is as follows..

R1(config)#ip sla 1
R1(config-ip-sla)#icmp-echo 172.16.24.4
R1(config-ip-sla-echo)#threshold 100
R1(config-ip-sla-echo)#timeout 200
R1(config-ip-sla-echo)#frequency 1

Commands above will implement the following respectively..

IP SLA entry number is 1
Target to ping is 172.16.24.4
RTT (Round Trip Time) value of the icmp-echo operation is 100 ms
Operation will timeout in 200 ms if no reply considering unreachable
Operation will execute in every second

Following command will start the operation from now and will run forever..
R1(config)#ip sla schedule 1 start-time now life forever

Following command will bind track object 10 with ip sla 1's return code..
R1(config)#track 10 ip sla 1

Following command will bind the static route with track object 10..
R1(config)#ip route 0.0.0.0 0.0.0.0 172.16.12.2 track 10

Following command will state the fallback route to ISP-2 with a higher metric (2)
R1(config)#ip route 0.0.0.0 0.0.0.0 172.16.13.3 2

Note that without IP SLA, if & only if the R1-R2 link goes down, the route will be failed over..

Configurations are over & it will work fine..

Threshold is boundary value measured over the operation result (e.g. RTT, or jitter value collected during the operation). Crossing threshold usually means SLA contract violation.

Timeout is the maximum time required for SLA operation to complete - for example the timeout waiting for probe response.

Timeout is directly used to restart the operation. Threshold is used to activate a response to IP SLA violation, e.g. send SNMP trap, start secondary SLA operation, route fallback etc..

Frequency > Timeout > Threshold

Important show commands:-
R1#show ip sla summary
R1#show ip sla statistics
R1#show track brief

In normal operation following healthy outputs will be visible..



As you can see;
the RTT is 1 hence the return code is OK..
Track object will remain up..




Return code will be displayed as "Over Threshold" & the track object will be "Down" when the RTT goes over 100ms. As soon as the IP SLA return code becomes OK again, (RTT becomes lesser than 100 ms), IP SLA code will be OK and the track object will be up changing the route again to ISP-1..

When the R2-R4 link goes down (unreachable), following will be the outputs..
(1st show ip route is when everything is ok)