10、SpringCloud第十章,升级篇,服务降级、熔断与实时监控Hystrix

编程

SpringCloud第九章,升级篇,服务降级、熔断与实时监控Hystrix

一、Hystrix概述

1、服务雪崩

服务雪崩service avalanche:

假设服务存在如上调用,service a流量波动很大,流量经常会突然性增加!那么在这种情况下,就算Service A能扛得住请求,Service B和Service C未必能扛得住这突发的请求。

此时,如果Service C因为抗不住请求,变得不可用。那么Service B的请求也会阻塞,慢慢耗尽Service B的线程资源,Service B就会变得不可用。紧接着,对Service A的调用就会占用越来越多的资源,进而引起系统崩溃。

如上,一个服务失败,导致整条链路的服务都失败的情形,我们称之为服务雪崩。

服务降级和服务熔断可以视为解决服务雪崩的手段。

2、服务熔断

当下游的服务因为某种原因突然变得不可用或者响应过慢,上游服务为了保证自己服务的可用性,不再继续调用目标服务,直接返回快速释放资源。

如果目标服务好转则恢复调用。

目前流行的熔断器很多,例如阿里出的Sentinel(之后会在博客中介绍),以及最多人使用的Hystrix。

Hystrix配置如下:

##滑动窗口的大小,默认为20

circuitBreaker.requestVolumeThreshold

##过多长时间,熔断器再次检测是否开启,默认为5000,即5s钟

circuitBreaker.sleepWindowInMilliseconds

##错误率,默认50%

circuitBreaker.errorThresholdPercentage

每当20个请求中,有50%失败时,熔断器就会打开,此时再调用此服务,将会直接返回失败,不再调远程服务。直到5s钟之后,重新检测该触发条件,判断是否把熔断器关闭,或者继续打开。

简单说:

类比保险丝达到最大服务访问后,直接拒绝访问,拉闸限电。然后调用服务降级的方法并返回友好提示。

3、服务降级

两种场景:

a、当下游服务由于某种原因响应过慢,下游服务主动停掉一些不太重要的业务,释放服务器资源,增加响应速度。

b、当下游服务因为某种原因不可用,上游主动调用本地的一些降级逻辑,避免卡顿,迅速回馈用户。

简单说:

服务器很忙,请稍后再试,不让客户端等待并立刻返回一个友好提示fallback.

4、服务限流

秒杀高并发等操作,严禁一窝蜂的过来拥挤,大家排队,一秒钟N个,有序进行。

5、服务降级和熔断的区别

相同点:

目标一致 都是从可用性和可靠性出发,为了防止系统崩溃;

用户体验类似 最终都让用户体验到的是某些功能暂时不可用;

不同点:

触发原因不同 服务熔断一般是某个服务(下游服务)故障引起,而服务降级一般是从整体负荷考虑;

管理目标的层次不太一样,熔断其实是一个框架级的处理,每个微服务都需要(无层级之分),而降级一般需要对业务有层级之分(比如降级一般是从最外围服务开始)

实现方式不太一样,服务降级具有代码侵入性(由控制器完成/或自动降级),熔断一般称为自我熔断。

二、案例

1、构建cloud-provider-hystrix-payment-8001

POM

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

<parent>

<artifactId>cloud_2020</artifactId>

<groupId>com.lee.springcloud</groupId>

<version>1.0-SNAPSHOT</version>

</parent>

<modelVersion>4.0.0</modelVersion>

<artifactId>cloud-provider-hystrix-payment-8001</artifactId>

<dependencies>

<!--hystrix-->

<dependency>

<groupId>org.springframework.cloud</groupId>

<artifactId>spring-cloud-starter-netflix-hystrix</artifactId>

</dependency>

<!--eureka client-->

<dependency>

<groupId>org.springframework.cloud</groupId>

<artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>

</dependency>

<dependency>

<groupId>com.lee.springcloud</groupId>

<artifactId>cloud-api-common</artifactId>

<version>${project.version}</version>

</dependency>

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-web</artifactId>

</dependency>

<!--监控-->

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-actuator</artifactId>

</dependency>

<!--热部署-->

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-devtools</artifactId>

<scope>runtime</scope>

<optional>true</optional>

</dependency>

<dependency>

<groupId>org.projectlombok</groupId>

<artifactId>lombok</artifactId>

<optional>true</optional>

</dependency>

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-test</artifactId>

<scope>test</scope>

</dependency>

</dependencies>

</project>

application.yml

server:

port: 8001

spring:

application:

name: cloud-provider-hystrix-payment

eureka:

client:

register-with-eureka: true

fetch-registry: true

service-url:

defaultZone: http://eureka7001.com:7001/eureka

主启动类:

@SpringBootApplication

@EnableEurekaClient

public class PaymentHystrixMain8001 {

public static void main(String[] args) {

SpringApplication.run(PaymentHystrixMain8001.class,args);

}

}

service

@Service

public class PaymentService {

//正常访问

public String paymentInfo_ok(Integer id){

return "thread:"+Thread.currentThread().getName()+" payment ok id : "+id+" ^_^";

}

//访问超时

public String paymentInfo_timeout(Integer id) throws InterruptedException {

int timeNumber = 3;

TimeUnit.SECONDS.sleep(timeNumber);

return "thread:"+Thread.currentThread().getName()+" payment timeout id : "+id+" ╥﹏╥";

}

}

controller

@RestController

@Slf4j

public class PaymentController {

@Resource

private PaymentService paymentService;

@Value("${server.port}")

private String servicePort;

//正常访问

@GetMapping("/payment/hystrix/ok/{id}")

public String paymentInfo_OK(@PathVariable("id") Integer id) {

String result = paymentService.paymentInfo_ok(id);

return result;

}

//超时访问

@GetMapping("/payment/hystrix/timeout/{id}")

public String paymentInfo_TimeOut(@PathVariable("id") Integer id) throws InterruptedException {

String result = paymentService.paymentInfo_timeout(id);

return result;

}

}

测试:

1、启动 eureka-service-7001

2、启动 hystrix-payment-8001

3、访问 http://localhost:8001/payment/hystrix/ok/1

结果马上出来

4、访问 http://localhost:8001/payment/hystrix/timeout/1

结果等待3s出来

JMeter压力测试

1、jmeter线程组200或2000个线程、循环100次

2、jmeter访问 http://localhost:8001/payment/hystrix/timeout/1

3、浏览器访问 http://localhost:8001/payment/hystrix/ok/1

结果转半天才回出来

原因:

jmeter在访问timeout方法时,tomcat的默认工作线程数被打满了,再访问ok方法时就没有多余的线程来分解压力来处理了。

2、构建cloud-consumer-feign-hystrix-order-80

POM

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

<parent>

<artifactId>cloud_2020</artifactId>

<groupId>com.lee.springcloud</groupId>

<version>1.0-SNAPSHOT</version>

</parent>

<modelVersion>4.0.0</modelVersion>

<artifactId>cloud-consumer-feign-hystrix-order-80</artifactId>

<dependencies>

<!--openfeign-->

<dependency>

<groupId>org.springframework.cloud</groupId>

<artifactId>spring-cloud-starter-openfeign</artifactId>

</dependency>

<!--eureka client-->

<dependency>

<groupId>org.springframework.cloud</groupId>

<artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>

</dependency>

<dependency>

<groupId>com.lee.springcloud</groupId>

<artifactId>cloud-api-common</artifactId>

<version>${project.version}</version>

</dependency>

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-web</artifactId>

</dependency>

<!--监控-->

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-actuator</artifactId>

</dependency>

<!--热部署-->

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-devtools</artifactId>

<scope>runtime</scope>

<optional>true</optional>

</dependency>

<dependency>

<groupId>org.projectlombok</groupId>

<artifactId>lombok</artifactId>

<optional>true</optional>

</dependency>

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-test</artifactId>

<scope>test</scope>

</dependency>

</dependencies>

</project>

application.yml

server:

port: 80

eureka:

client:

register-with-eureka: false

fetch-registry: true

service-url:

defaultZone: http://eureka7001.com:7001/eureka

主启动类

@SpringBootApplication

@EnableEurekaClient

@EnableFeignClients

public class OrderHystrixMain80 {

public static void main(String[] args) {

SpringApplication.run(OrderHystrixMain80.class,args);

}

}

service

@Component

@FeignClient(value = "CLOUD-PROVIDER-HYSTRIX-PAYMENT")

public interface PaymentHystrixService {

//正常访问

@GetMapping("/payment/hystrix/ok/{id}")

public String paymentInfo_OK(@PathVariable("id") Integer id);

//超时访问

@GetMapping("/payment/hystrix/timeout/{id}")

public String paymentInfo_TimeOut(@PathVariable("id") Integer id);

}

controller

@RestController

@Slf4j

public class OrderHyrixController {

@Autowired

private PaymentHystrixService paymentHystrixService;

@GetMapping("/consumer/payment/hystrix/ok/{id}")

public String paymentInfo_OK(@PathVariable("id") Integer id){

return paymentHystrixService.paymentInfo_OK(id);

}

@GetMapping("/consumer/payment/hystrix/timeout/{id}")

public String paymentInfo_TimeOut(@PathVariable("id") Integer id){

return paymentHystrixService.paymentInfo_TimeOut(id);

}

}

测试:

1、启动eureka-7001

2、启动hystrix-payment-8001

3、启动hystrix-order-80

4、访问http://localhost/consumer/payment/hystrix/ok/2

压测:

1、jmeter线程组200或2000个线程、循环100次

2、jmeter访问 http://localhost:8001/payment/hystrix/timeout/2

3、浏览器访问 http://localhost/consumer/payment/hystrix/ok/2

结果转半天才回出来,或者直接报错

Read timed out executing GET http://CLOUD-PROVIDER-HYSTRIX-PAYMENT/payment/hystrix/ok/2

3、如何解决

3.1、服务降级

3.1.1、服务端降级

降级配置:@HystrixCommand

8001先从自身查找问题,设置调用超时的峰值,峰值内正常运行,超过了服务降级fallback

cloud-provider-hystrix-payment-8001做如下处理:

主启动类:

@SpringBootApplication

@EnableEurekaClient

@EnableHystrix //@EnableCircuitBreaker和@EnableHystrix的作用是一样的

public class PaymentHystrixMain8001 {

public static void main(String[] args) {

SpringApplication.run(PaymentHystrixMain8001.class,args);

}

}

service:

@Service

public class PaymentService {

//正常访问

public String paymentInfo_ok(Integer id){

return "thread:"+Thread.currentThread().getName()+" payment ok id : "+id+" ^_^";

}

//访问超时

@HystrixCommand(fallbackMethod = "paymentInfo_TimeOut_handler",commandProperties = {

@HystrixProperty(name="execution.isolation.thread.timeoutInMilliseconds",value = "3000")

})

public String paymentInfo_timeout(Integer id) throws InterruptedException {

// int a = 100/0;

int timeNumber = 5;

TimeUnit.SECONDS.sleep(timeNumber);

return "thread:"+Thread.currentThread().getName()+" payment timeout id : "+id+" ╥﹏╥";

}

//降级备用方法fallback

public String paymentInfo_TimeOut_handler(Integer id){

return "调用服务接口超时or异常 "+Thread.currentThread().getName();

}

}

测试:

1、启动eureka-7001

2、启动provider-hystrix-payment-8001

3、测试 http://localhost:8001/payment/hystrix/timeout/1

结果返回:调用服务接口超时or异常 HystrixTimer-1

因为hystrixCommand设置超时峰值为3s,代码内timenumber为5s,所以访问接口是超过3s就直接调用降级备用方法fallback

3.1.2、消费端降级

cloud-consumer-feign-hystrix-order-80做如下处理:

POM新增:

<!--hystrix-->

<dependency>

<groupId>org.springframework.cloud</groupId>

<artifactId>spring-cloud-starter-netflix-hystrix</artifactId>

</dependency>

application.yml新增:

feign:

hystrix:

enabled: true

主启动类新增:

@EnableHystrix //@EnableCircuitBreaker和@EnableHystrix的作用是一样的

##底层源码

@Target({ElementType.TYPE})

@Retention(RetentionPolicy.RUNTIME)

@Documented

@Inherited

@EnableCircuitBreaker

public @interface EnableHystrix {

}

controller:

@RestController

@Slf4j

public class OrderHyrixController {

@Autowired

private PaymentHystrixService paymentHystrixService;

@GetMapping("/consumer/payment/hystrix/ok/{id}")

public String paymentInfo_OK(@PathVariable("id") Integer id){

return paymentHystrixService.paymentInfo_OK(id);

}

@GetMapping("/consumer/payment/hystrix/timeout/{id}")

@HystrixCommand(fallbackMethod = "paymentInfo_TimeOut_fallback_method",commandProperties = {

@HystrixProperty(name="execution.isolation.thread.timeoutInMilliseconds",value = "1500")

})

public String paymentInfo_TimeOut(@PathVariable("id") Integer id){

return paymentHystrixService.paymentInfo_TimeOut(id);

}

public String paymentInfo_TimeOut_fallback_method(@PathVariable("id") Integer id){

return "this is 80 port, consumer ,对方支付接口异常or超时,此时在执行自己的fallback服务降级方法"+id;

}

}

测试:

1、启动eureka-7001

2、启动provider-hystrix-payment-8001

3、启动consumer-feign-hystrix-order-80

3、测试 http://localhost/consumer/payment/hystrix/timeout/1

结果返回:this is 80 port, consumer ,对方支付接口异常or超时,此时在执行自己的fallback服务降级方法1

因为80端hystrixCommand设置超时峰值为1.5s,8001端hystrixCommand设置超时峰值为3s,超过了80的1.5s,所以80调用自己的fallback服务降级方法。

8001没走到调用自己fallback方法的那一步。

3.1.3、代码膨胀的问题

大部分hystrix实在consumer端解决的,所以我们修改cloud-consumer-feign-hystrix-order-80

修改controller:

@RestController

@Slf4j

@DefaultProperties(defaultFallback = "paymentInfo_TimeOut_global_fallback_method")

public class OrderHyrixController {

@Autowired

private PaymentHystrixService paymentHystrixService;

@GetMapping("/consumer/payment/hystrix/ok/{id}")

public String paymentInfo_OK(@PathVariable("id") Integer id){

return paymentHystrixService.paymentInfo_OK(id);

}

@GetMapping("/consumer/payment/hystrix/timeout/{id}")

@HystrixCommand //任何需要降级method都可以添加 统一用controller上增加的fallback

public String paymentInfo_TimeOut(@PathVariable("id") Integer id){

return paymentHystrixService.paymentInfo_TimeOut(id);

}

public String paymentInfo_TimeOut_fallback_method(@PathVariable("id") Integer id){

return "this is 80 port, consumer ,对方支付接口异常or超时,此时在执行自己的fallback服务降级方法"+id;

}

//由于是全局fallback,所以不能加入参

public String paymentInfo_TimeOut_global_fallback_method(){

return "this is global 80 port, consumer ,对方支付接口异常or超时,此时在执行自己的fallback服务降级方法";

}

}

测试:

同上3.1.2

再增加一个provider宕机的测试

返回结果:

this is global 80 port, consumer ,对方支付接口异常or超时,此时在执行自己的fallback服务降级方法

3.1.4、和业务逻辑混在一起

上面fallback方法和controller中的业务逻辑混在一起,分层不清晰。

解决方案:我们80服务,使用了feign,所以我们可以给每一个feign service创建一个PaymentFallbackService接口实现paymentHystrixService.

PaymentHystrixService

@Component

//@FeignClient(value = "CLOUD-PROVIDER-HYSTRIX-PAYMENT")

@FeignClient(value = "CLOUD-PROVIDER-HYSTRIX-PAYMENT",fallback = PaymentFallbackService.class)

public interface PaymentHystrixService {

//正常访问

@GetMapping("/payment/hystrix/ok/{id}")

public String paymentInfo_OK(@PathVariable("id") Integer id);

//超时访问

@GetMapping("/payment/hystrix/timeout/{id}")

public String paymentInfo_TimeOut(@PathVariable("id") Integer id);

}

PaymentFallbackService

@Component

public class PaymentFallbackService implements PaymentHystrixService {

@Override

public String paymentInfo_OK(Integer id) {

return "----------------->paymentInfo_Ok_fallback_method";

}

@Override

public String paymentInfo_TimeOut(Integer id) {

return "----------------->paymentInfo_TimeOut_fallback_method";

}

}

controller

@DefaultProperties(defaultFallback = "paymentInfo_TimeOut_global_fallback_method")

@HystrixCommand

都删除

测试:

同上3.1.2

再增加一个provider宕机的测试

返回结果:

----------------->paymentInfo_TimeOut_fallback_method

3.2、服务熔断

熔断机制是应对雪崩效应的一种微服务链路保护机制。当扇出链路中的某个微服务不可用或者响应时间过长时,会进行服务降级,进而熔断该节点微服务的调用,快速返回错误的响应信息。

当检测到该微服务调用响应正常后(达到一定正常比例),恢复链路调用。

在springcloud框架里,熔断机制通过hystrix实现,hystrix会监控微服务间的调用情况。

当调用失败比例达到一定阈值时,缺省时5s内达到20次调用失败,就会启动熔断机制,熔断机制的注解是@HystrixCommand

修改cloud-provider-hystrix-payment-8001

PaymentService新增如下:

//断路器

@HystrixCommand( fallbackMethod = "paymentInfo_circuitBreaker_handler",commandProperties = {

@HystrixProperty(name="circuitBreaker.enabled",value = "true"),//是否开启断路器

@HystrixProperty(name="circuitBreaker.requestVolumeThreshold",value = "10"),// 请求次数

@HystrixProperty(name = "circuitBreaker.sleepWindowInMilliseconds",value = "10000"),//时间窗口期

@HystrixProperty(name = "circuitBreaker.errorThresholdPercentage",value = "60")//失败率达到多少后跳闸

})

public String paymentInfo_circuitBreaker(Integer id){

if(id<0){

throw new RuntimeException("----->id 不能为负数.");

}

String serialNumber = UUID.randomUUID().toString();

return "paymentInfo_circuitBreaker 调用成功,流水号"+serialNumber;

}

//断路器降级备用方法fallback

public String paymentInfo_circuitBreaker_handler(Integer id){

return "ID 不能为负数,请稍后再试......."+id;

}

Controller新增如下:

    //断路器

@GetMapping("/payment/hystrix/circuitBreaker/{id}")

public String paymentInfo_circuitBreaker(@PathVariable("id") Integer id){

String result = paymentService.paymentInfo_circuitBreaker(id);

log.info("----->"+result);

return result;

}

自测:

1、启动eureka-7001

2、启动cloud-provider-payment-8001

3、访问:http://localhost:8001/payment/hystrix/circuitBreaker/1

结果:paymentInfo_circuitBreaker 调用成功,流水号ae6598ff-34f5-4d36-baa2-c8125bf1722a

4、再访问:http://localhost:8001/payment/hystrix/circuitBreaker/-1

结果:ID 不能为负数,请稍后再试.......-1

5、重复快速的多次访问http://localhost:8001/payment/hystrix/circuitBreaker/-1

然后再访问http://localhost:8001/payment/hystrix/circuitBreaker/1

发现正确的也会报:ID 不能为负数,请稍后再试.......-1

重复多次访问http://localhost:8001/payment/hystrix/circuitBreaker/1后

结果:paymentInfo_circuitBreaker 调用成功,流水号472fc9d5-e171-4b65-a541-b81acc8eeb5a

成功---失败---成功(规定时间段内失败和成功次数和比例的问题,断路器起到了作用)

备注:

//@HystrixCommand  circuitbreaker所需的参数都在HystrixCommandProperties类里

public abstract class HystrixCommandProperties {

private static final Logger logger = LoggerFactory.getLogger(HystrixCommandProperties.class);

static final Integer default_metricsRollingStatisticalWindow = 10000;

private static final Integer default_metricsRollingStatisticalWindowBuckets = 10;

private static final Integer default_circuitBreakerRequestVolumeThreshold = 20;

private static final Integer default_circuitBreakerSleepWindowInMilliseconds = 5000;

private static final Integer default_circuitBreakerErrorThresholdPercentage = 50;

private static final Boolean default_circuitBreakerForceOpen = false;

static final Boolean default_circuitBreakerForceClosed = false;

//......

}

3.3、服务限流

后边springcloud alibaba写sentinel的时候再写

以上是 10、SpringCloud第十章,升级篇,服务降级、熔断与实时监控Hystrix 的全部内容, 来源链接: utcz.com/z/515535.html

回到顶部