SERVICE LEVEL AGREEMENT CUIT Converged Infrastructure: Infrastructure Services for VM Hosting Text Description automatically generated Version: 1.0 Author: CUIT Date: August, 2015 Table of Contents 1.0Document Change History 2.0 Overview 2.1 SLA Introduction 2.2 SLA Purpose 2.3 SLA Duration & Parameters 3.0 Service Description, Tier & Targets 3.1 Service Description 3.2 Service Availability Targets 3.3 Service Maintenance Maintenance Windows: Standard, NonStandard, Emergency Maintenance Notifications/Announcements 3.4 Service Continuity 3.5 Service Scalability 3.6 Service Security 4. Service Management, Support & Escalation 4.1 Support Hours 4.2 Support Phone Contact 4.3 Service Desk 4.4 Incident and Major Incident Management 4.5 Change Management 4.6 Handling and Response times 4.7 Escalation Requests and Procedures 5. Monitoring Service Performance 5.1 Process & Procedural Responsibilities 5.2 Service Reporting 5.3 Performance Review 6. Conditions of Services Provided 6.1 Standards and Policies 6.2 Responsibilities & Exclusions 7. SLA Signatures APPENDICES APPENDIX A: CUIT Service Owner & Key Business Stakeholders APPENDIX B: Availability Percentages APPENDIX C: Guaranteed Service Component Levels APPENDIX D: Priority Definitions APPENDIX E: CUIT Technical Standards and Policies APPENDIX F: ​Pricing Model APPENDIX G: General 1.0 Document Change History Version Date Updated By Changes to this version 1.0 8/13/2015 Sat Persaud Document creation 1.1 11/17/2015 Sat Persaud Types of VMs offered; access restrictions Last Reviewed Date: 11/17//2015 Next Scheduled Review Date: 8/13/2016 2.0 Overview 2.1 SLA Introduction This Service Level Agreement, henceforth also known as “SLA,” is between Columbia University Information Technology (CUIT/Service Provider), henceforth also known as “CUIT” and the client for all services and service levels in connection to the Virtual Machine (VM) Hosting service, henceforth also known as “ Service.” 2.2 SLA Purpose The purpose of this SLA is to set expectations for the provision of the Service as it is defined herein with regard to: o Requirements for VMs that will be hosted o Criteria that will be used to measure the Service o Agreed service level targets that are the minimum performance requirement o Roles and responsibilities of CUIT and Client o Escalation contacts o Associated and supporting processes as well as any deviations 2.3 SLA Duration & Parameters This section defines the duration and describes the rules regarding renewal, modification, amendment, and termination of the SLA: 1. This SLA is effective as of this date the VM is provisioned between CUIT and Client and will expire as of the date it is deprovisioned. 2. This SLA will automatically renew unless CUIT and Client mutually agree to another arrangement. 3. A review of this SLA by CUIT and Client may be conducted, if requested, a minimum of 30 days before the expiration date of the agreement. Modification requests must be submitted in writing via email to the CUIT Service Owner for Infrastructure Services (See Appendix A)​. 4. Any amendments, modifications, or other terms outside those stated herein must be agreed upon by both parties. 5. The Client is responsible for providing CUIT with details of any current or future projects that may impact the provision of this SLA. 6. Service Extensions: Any requests to extend the hours of service on an ad hoc basis for a given day must be made to the Infrastructure Services team at the earliest opportunity. Failure to submit a request for a service extension will mean that the service will not be guaranteed beyond the hours defined by this SLA. 7. SLA Termination: Both parties must agree to any termination arrangement. CUIT requires a minimum of 90 days notice regarding early termination of the SLA. 3.0 Service Description, Virtual Servers & Targets 3.1 Virtual Server Options This service is to host a virtual machine on CUIT’s converged infrastructure platform with one of the following systems: ● VM with RHEL 6.x ● VM with Windows Server 2008 R2 CUIT is responsible for maintaining the underlying hardware, network and storage to ensure the service targets indicated below. ● For a VM with RHEL or Windows OS installed, CUIT will manage up the OS layer. The client is responsible for the support of the application residing on the virtual server. Please refer to ​Appendix F for CPU, memory and storage configuration options and pricing.​ 3.2 Service Availability Targets The Service for hosting production instances will be available to Clients on a 24x7 basis except for maintenance windows or other scheduled or applicationspecific maintenance outlined herein. It is our aim to ensure that the services supporting the Service are deemed reliable in terms of availability and performance. Therefore, we will measure the reliability using Mean Time Between Failures (MTBF) and compute the average (by month and year) time between each ‘failure’. CUIT will strive to achieve a MTBF of 100 days at the minimum. A failure is defined as any infrastructurerelated incident causing the Service to be unavailable. This can also include severe performance degradation. The target availability of VM Hosting is 99.9%. (See ​Appendix B)​ 3.3 Service Maintenance Maintenance includes but is not limited to: adding/removing/replacing hardware on servers or network, bringing new servers online, patching servers/workstations/network devices, installing new/updated software on servers/workstations/network devices, etc. The network and/or systems will be interrupted only if it is absolutely necessary. Maintenance Windows: Standard, NonStandard, Emergency A Standard Maintenance Window has been established for all CUIT services, including the Service, between the hours of 02:00 06:00 ET on the third Sunday of each month. If there is a need for a change outside the hours of the Standard Maintenance Window, the resulting NonStandard Maintenance Window will require a formal approval from the CUIT business service owner. It is understood that in some circumstances, Emergency Maintenance Windows will be required. Maintenance Notifications/Announcements CUIT will announce all Maintenance Windows (Standard, NonStandard, and Emergency) including which services will be affected and approximate durations, in the following ways: 1. On the CUITalerts website: http://cuitalerts.columbia.edu​ 2. Via email to ​servicealerts@columbia.edu a. Client must provide CUIT with a valid designated representative or group email address to be added to the distribution list b. Client must notify CUIT promptly in the event of any change/update for that representative or group email address 3.4 Service Continuity Service Continuity (Disaster Recovery) is hosted at a hot site at the NYSERNet facility in Syracuse, New York. If a disaster is declared, the Service will be recovered at the alternate location in the timescales outlined below based on how the application has been categorized. CUIT’s primary mission in the event of a disaster invocation is to recover all production applications. Once complete, critical application processing will resume within a specified period of time following the declaration of an emergency outage. DR pricing options are outlined in ​Appendix F 3.5 Service Scalability CUIT will scale the Service at the Client’s request to allow for growth and/or redundancy. However, such scalability may be capped and will require lead times as defined in Appendix H: Pricing Model 3.6 Service Security Services provided by CUIT will conform to CUIT’s security and data classification policies outlined at ​https://cuit.columbia.edu/cuit/itpolicies.​ If it is determined that any component of the Service is adversely impacting service availability, e.g., a Denial of Service condition, CUIT reserves the right to terminate the Service immediately until the impacting condition is remediated. Upon provisioning of the Virtual Machine, you will receive access instructions. Please note that in order to uphold CUIT’s Security and Standard Operating Environment (SOE) policies, the following restrictions will apply 1. CUIT will be the sole possessor of root or administrator credentials to the server 2. You will be given an account with sufficient privileges to perform basic server administration 3. Each VM will be configured with CUIT management tools including Tripwire, SumoLogic and Puppet which cannot be disabled and which must remain operational so that we can manage the environment 4. Each VM will be provisioned in a designated security zone based on your initial configuration. Should you require additional access or ports, you will need to submit a Service Request ticket 5. Virtual machines will be accessible via secure methods only e.g. via SSH, designated Jump hosts or using Toopher authentication. Details will be provided in your access instructions 6. Failure to complete the RSAM security questionnaire within 2 weeks of the VM being provisioned may result in the machine being suspended 4. Service Management, Support & Escalation 4.1 Support Hours Infrastructure services defined in this SLA will be supported on a 24x7x365 basis. Live technical support is available: ● 8:00am6:00pm ET, Monday through Friday, excluding all holidays and university closures. ● Outside normal coverage hours, CUIT will work to resolve issues on a besteffort basis. 4.2 Support Phone Contact Client can contact CUIT Service Desk for support by calling ​2128541919. 4.3 Service Desk CUIT will respond to all faults, queries and service requests ​only​ if a call is placed with the Service Desk. By enforcing this policy, CUIT can ensure that all faults are managed effectively and in line with the commitments of this SLA. It is imperative that any issues deemed Critical in their nature are reported to the Service Desk by phone to ensure immediate response and investigation can occur. Other issues can be reported via email to: askcuit@columbia.edu​ .​ CUIT Service Desk will log, track, assign, and manage all requests, incidents, problems, and queries through CUIT’s service ticket system. When the Service Desk cannot provide a resolution at the time of call logging, they will provide: o Unique reference number (Incident Ticket) o Priority assigned to the call 4.4 Incident and Major Incident Management The purpose of the Major Incident process is to ensure that all faults and queries reported to the Service Desk are managed to minimize business impact by restoring service as soon as possible in accordance with the SLA. The following processes are employed for the management of CUIT incidents: o Incident Management o Major Incident Management Priority definitions and associated resolution times have been agreed with regards to all faults reported to the Service Desk and will follow targets outlined in Appendix D​ .​ 4.5 Change Management All CUIT/Client proposed changes must adhere to the predefined Change Management process (see ​Appendix E)​. CUIT will take responsibility for the Request For Change (RFC) evaluation, impact assessment, risk analysis, approval and communication prior to implementations where applicable. Failure to adhere to the Change Management process will be deemed as a breach of this SLA. 4.6 Handling and Response times CUIT will work to resolve known/reported service problems and provide relevant progress reports to the Client. Handling ● Requests for support will be fulfilled based on priorities (Critical, High, Medium, Normal) which are determined by urgency and level of impactsee below. ● Response is defined as a “good faith” effort to communicate with the Client using contact information provided. Response may be via phone or voice mail, email, or personal visit. ● Response times for service requests are measured once a request is submitted via the CUIT issue tracking system. Other forms of contact may negatively affect the ability of CUIT to meet the requests in a timely fashion. Examples include direct email/phone/other contact with individual support personnel. Response Times Response will be driven by the Priority assigned to the Service as defined in this SLA (See Appendix D)​. Note: Complex service and support requests involving the procurement/installation of new equipment, coordination with 3​^rd​ parties, etc., may require additional effort and time to resolve. 4.7 Escalation Requests and Procedures 1. In the event service is unsatisfactory, the Client will contact the CUIT Service Owner identified in ​Appendix A​ ​to request escalation of an incident/problem/request. 2. If needed a joint meeting between the Client and CUIT will be convened to discuss and resolve issues to restore services to satisfactory levels. 3. In the event that additional escalation is determined to be necessary, CUIT will escalate to its Senior Leadership Team for a resolution. 4. CUIT may periodically request your feedback. 5. Monitoring Service Performance 5.1 Process & Procedural Responsibilities CUIT will ensure procedures exist to measure and monitor the level of service provided against the defined service targets. CUIT’s services are aligned with ITIL Best Practice service management methodologies with regard to Service Desk, Incident Management, Problem Management (PIR), and Change Management. 5.2 Service Reporting Beginning in calendar year 2016, CUIT plans to provide quarterly utilization (capacity), performance, and availability reports for the Service. Client can request reporting on additional parameters; however, such parameters will be included at the sole discretion of CUIT based on impact and resources needed. 5.3 Performance Review Periodic Service Level Review (SLR) meetings will be established for all stakeholders. The primary goals of the meetings will be to review performance against service targets and to agree on any remedial action as appropriate. SLR meetings will provide an opportunity to discuss organizational, operational and strategic changes. CUIT will continually monitor, review and if necessary act upon the service performance against the Service Level as defined within this SLA. 6. Conditions of Services Provided 6.1 Standards and Policies The operation of this SLA will be subject to the CUIT’s policies and standards outlined at https://cuit.columbia.edu/cuit/itpolicies.​ In the event of any changes that may have an impact on the performance of the Service, CUIT will inform the Client at least 3 business days (per the defined Change Management policy) prior to any change. See​ Appendix E.​ 6.2 Responsibilities & Exclusions Both parties agree to act with good intentions (See ​Appendix G).​ CUIT Responsibilities 1. CUIT shall provide the services identified in the SLA and shall ensure the services are maintained at all times and to agreed predefined standards. CUIT agrees to exercise professional care and diligence in the discharge of all the services and to comply in all respects with relevant standards. 2. CUIT will act as owner, supplier, maintainer, and supporter of the herein identified and defined CUIT Services that have been requested/required by the Client, except where CUIT has employed thirdparties who will assume those responsibilities. 3. CUIT will be responsible for daytoday management of the SLA and liaise with the Client to ensure that information flows freely between both parties. 4. CUIT will follow established internal processes/procedures and adhere to policies and standards. 5. CUIT will not make changes to the systems/services offered without prior notification and Client approval through the defined Change Management process. 6. CUIT will inform the Client in the event of any incident likely to affect the availability or performance of their applications. CUIT Exclusions 7. CUIT is not responsible for unsupported configurations that deviate from our technology standards unless an explicit exemption has been granted. 8. Services for which CUIT has no formal support agreements or contracts relating to service availability and incident response or fix times on IT/Network components which are the responsibility of an external vendor. Client Responsibilities 9. Client shall provide all necessary information, assistance and instructions in a manner that enables CUIT to meet performance standards, for example, by the giving of adequate notice and disclosing of all known relevant information. 10. Client is required to ensure attendance/participation at Major Incident and Problem review meetings as requested by CUIT to assist with the definition of service impact. 11. Client is required to advise the appropriate CUIT team if the requirements of the business change and the need for a review of the SLA is identified. 12. Client is required to report all issues, queries and requests via appropriate channels and processes. APPENDICES APPENDIX A: CUIT Service Owner & Key Business Stakeholders This document will be distributed as follows; each name on the distribution list has been identified as a Key Business Stakeholder. Name Job Title SLA Role Contacts Client Jim Bossio AVP Infrastructure Services CUIT Service Owner jim.bossio@columbia.edu (212) 8512184 (917) 9930163 Sat Persaud Director, Infrastructure Support Services Infrastructure Services Representative spersaud@columbia.edu (212) 8544989 (917) 7310255 APPENDIX B: Availability Percentages Availability is expressed as a percentage of uptime in a given year. The following table shows the downtime that will be allowed for a particular percentage of availability, presuming that the system is required to operate continuously 7x24 basis. The table shows the translation from a given availability percentage to the corresponding amount of time a system would be unavailable per year or month. NOTE :​ If an application will not require a 7x24 availability these examples do not apply​. In such cases, CUIT will negotiate with the application owner for allowable downtime. Availability is calculated using the following formula: Availability% = (Promised uptime actual uptime) / Promised uptime where promised uptime is exclusive of maintenance windows Availability for 7x24 Downtime per year (Days H:M:S) Downtime per month (Days H:M:S) 95% 18 days 6:00:00 1 days 12:00:00 96% 14 days 14:24:00 1 days 4:48:00 97% 10 days 22:48:00 0 days 21:36:00 98% 7 days 7:12:00 0 days 14:24:00 99% 3 days 15:36:00 0 days 7:12:00 99.10% 3 days 6:50:24.00 0 days 6:28:48.00 99.20% 2 days 22:04:47.99 0 days 5:45:35.99 99.30% 2 days 13:19:12.00 0 days 5:02:24.00 99.40% 2 days 4:33:35.99 0 days 4:19:11.99 99.50% 1 days 19:48:00 0 days 3:36:00 99.60% 1 days 11:02:24.00 0 days 2:52:48.00 99.70% 1 days 2:16:47.99 0 days 2:09:35.99 99.80% 0 days 17:31:12.00 0 days 1:26:24.00 99.90% 0 days 8:45:35.99 0 days 0:43:11.99 99.95% 0 days 4:22:47.99 0 days 0:21:35.99 99.99% 0 days 0:52:33.60 0 days 0:04:19.20 100.00% 0 days 0:05:15.36 0 days 0:00:25.92 APPENDIX C: ​Request fulfillment times Infrastructure component Fulfillment time Compute Power (vCPU) 1 business day Memory (GB) 1 business day Storage allocation (TB) 1 business day Storage tier relocation 1 business day APPENDIX D: Priority Definitions The following priority definitions and associated resolution times have been agreed with regards to all faults reported to the CUIT Service Desk: Priority Response Response Targets Resolution Resolution Targets Critical 30 minutes 95% 2 hours 80% High* 2 hours 90% 12 hours 80% Medium* 12 hours 80% 24 hours 75% Low* 24 hours 75% 56 hours 70% * Represents Business Hours Only (e.g., an 8 hour resolution for a High Priority Incident that is reported at 12:30pm ET on a Tuesday will continue until 12:30pm on Wednesday). Response Time:​ The initial period in which an CUIT SME will be assigned to the incident Update Time:​ The period by which CUIT will provide progress update to the Client Resolution Time:​ The period by which CUIT will resolve the issue. CUIT will base a call’s priority on the following factors: Criticality of the issue and the impact; number of users impacted; University/Business critical dates/times APPENDIX E: ​CUIT Technical Standards and Policies Enterprise Architecture Standards https://docs.google.com/a/columbia.edu/spreadsheet/ccc?key=0AmKXCgrIY2yvdFg3REVzQ0xy M2IwbjlMZVJ4YnNiZEE#gid=15 Data Security and Classification Policy http://policylibrary.columbia.edu/dataclassificationpolicy Acceptable Use Policy http://policylibrary.columbia.edu/acceptableusageinformationresourcespolicy Change Management Policy https://wiki.cc.columbia.edu/_media/itsm:cuit_change_management_v3_02feb2015.pdf?media=its m:cuit_change_management_v3_02feb2015.pdf Appendix F: Pricing Model Standard CI VM Pricing Compute Option vCPU Count Mem/gig Storage/gig Cost/mo 1) 2 4 50 $102.00 2) 4 6 50 $141.00 3) 6 8 50 $180.00 4) 8 8 50 $206.00 Additional Storage Increment Additional 50g Increment up to 500g max 50 $5.00 Storage beyond initial is .10g No requests < 50g Backup Optional AddOn Amount/gig Cost/mo Initial 50g of B/U storage Backup of VM with 50g 50 $25.00 Increment Additional 50g Increment up to 500g max 50 $2.50 B/U Storage beyond initial is .05g No requests < 50g Disaster Recovery Optional AddOn DR will be 100% addon of the environment cost Option 14 (CPU+MEM+Storage)+any additional storage=DR Sample Scenarios Option 1 No Backup or DR $102.00 Option 2 Add 200GB + Backup $196.00 Option 3 w/300GB With Backup & DR $250.00 Disaster Recovery $250.00 Total $500.00 APPENDIX G: General Client and CUIT are entering into this SLA to recognize the need to establish and foster a true ‘spirit of partnering’ to achieve the service outcomes desired by both parties. Both parties undertake to innovate and combine their joint strengths, expertise and energies to underpin and promote the development of this partnering relationship and acknowledge that a mutual understanding of each partner’s objectives in relation to the goals of the service is of paramount importance. Both parties recognize the mutual need to optimize each others opportunities for developing and achieving continuous improvement, both in their own and each other’s performance, thereby maintaining, enhancing and improving service outcomes through the life of the SLA. In recognition of the above, both parties undertake to adopt the following principles: ● To be open and honest with each other in all respects; this means that they will do what they say they will do, deliver what they say they will deliver and not promise either if unachievable. ● To promote transparency in all their activities; this means that they will communicate their activities in relation to managing, implementing and delivering their responsibilities and inputs to the success of service to each other or any agency (voluntary or otherwise) outside the partnering relationship where such transparency is relevant to each other’s business or statutory obligations. ● To establish well structured, good, meaningful and continuous communication; this means, for example, that meetings should be brief and not wasteful of each other’s resource utilization time. In addition, they​ ​should always be minuted (by partners in rotation if necessary) and the minutes relayed to the personnel distribution list as expeditiously as possible. Verbal communications should be confirmed in writing to avoid ambiguity. Direction clarification and instructions should be strictly in adherence with the hierarchy of personnel set out in the SLA. ● To undertake a ‘solution centered and proactive management’ approach to the service; this means that each shall enthuse and develop a proactive approach to resolving joint and each other’s problems arising or likely to arise under the contract. Each needs to be flexible in his/her approach, welcome opportunity to share ideas, gain understanding from the learning curve, want to know why it (what?) went wrong, and strive to get it right next time. ● To adopt a joint commitment to a cycle of continuous, measurable improvement in service; this means that each will actively seek to do the right things, seek out waste and nonvalue adding activities impacting on or peripheral to the service inputs in each other’s organization, drive out duplication where it exists and seek to​ ​optimize the efficiency to both partners mutual benefit. ● To commit to the mutually agreed principles above and understand that trust between the partners can only be retained and developed if the essence and true spirit of this undertaking is owned by both partners. ● CUIT will be proactive in considering value added services and efficiencies to be gained through this agreement.