云途ECC机制介绍
GaoSheng Lv5

ECC:Error Correction Code,纠错码。是一种用于检测和纠正 Flash 存储器中的错误的技术
云途MCU可以做到单bit错误自动纠正,2bit或多bit错误报错的机制。ECC单bit错误可以不做处理,多bit错误默认会跳转到HardFault
本文要简要介绍一下云途ECC error处理库UtilityFlashEccFault的原理

首先在YCT工具中安装UtilityFlashEccFault组件
image

安装完后在Board_Init函数里面尽量快的初始化 UtilityFlashEccFault 模块,建议放在时钟初始化后面,不然可能会出现奇怪的错误,比如在测试ECC注入时直接报错(理论上注入ECC不会立马触发错误,只要读到对应的错误地址才会触发)
image

UtilityFlashEccFault的大致原理:
在FLASH_ECC_Fault_Init中,FLASH_ECC_Fault_Handler会接管HardFault错误,
image

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void FLASH_ECC_Fault_Handler(void)
{
__asm volatile(
"MOVS R0, #4 \n" // Load immediate value 4 into R0 to test the LR bit.
"MOV R1, LR \n" // Copy Link Register (LR) into R1.
"TST R1, R0 \n" // Test if bit2 of LR is set.
"BNE get_psp \n" // If bit2 is set, use Process Stack Pointer (PSP).
"MRS R0, MSP \n" // Otherwise, load Main Stack Pointer (MSP) into R0.
"B c_handler \n" // Branch to call the C fault handler.
"get_psp: \n"
"MRS R0, PSP \n" // Load Process Stack Pointer (PSP) into R0.
"c_handler: \n"
"LDR R2, =FLASH_ECC_Fault_Handler_C \n" // Load to FLASH_ECC_Fault_Handler_C to R2
"BX R2 \n" // jump address in R2
);
}

FLASH_ECC_Fault_Handler看起来可能有些难以理解,我们只需要知道它的作用是识别异常发生在哪个栈上(MSP 还是 PSP),然后把那个栈的地址作(R0)为参数传入FLASH_ECC_Fault_Handler_C函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
void FLASH_ECC_Fault_Handler_C(hw_stackframe_t *fault_stack)
{
DEV_ASSERT(s_FlashEccFaultStatePtr != NULL);

uint16_t opcode = *((uint16_t *)(fault_stack->pc));
uint8_t instrSize = 0;

/* Determine the instruction size based on the opcode. */
if ((opcode & 0xE800) == 0xE800 || // 32-bit instruction check (pattern 11101x...)
(opcode & 0xF000) == 0xF000 || // 32-bit instruction check (pattern 11110x...)
(opcode & 0xF800) == 0xF800) // 32-bit instruction check (pattern 11111x...)
{
instrSize = 4;
}
else
{
instrSize = 2;
}

/* If the Flash ECC unrecovery error flag is set, skip the faulting instruction. */
if (EFM->STS & EFM_STS_UNRECOVERR_MASK)
{
fault_stack->pc += instrSize;
#ifdef SCB_SHCSR_BUSFAULTENA_Msk
SCB->CFSR = SCB_CFSR_BUSFAULTSR_Msk | SCB_CFSR_PRECISERR_Msk;
#endif /* SCB_SHCSR_BUSFAULTENA_Msk */

if (s_FlashEccFaultStatePtr->eccCallback != NULL)
{
s_FlashEccFaultStatePtr->eccCallback();
}
}
#ifdef EFM_STS_CI_UNRECOVERR_MASK
else if (EFM->STS & EFM_STS_CI_UNRECOVERR_MASK)
{
fault_stack->pc += instrSize;
SCB->CFSR = SCB_CFSR_BUSFAULTSR_Msk | SCB_CFSR_PRECISERR_Msk;

if (s_FlashEccFaultStatePtr->eccCallback != NULL)
{
s_FlashEccFaultStatePtr->eccCallback();
}
}
#endif /* EFM_STS_CI_UNRECOVERR_MASK */
else
{
if(s_FlashEccFaultStatePtr->otherFaultCallback != NULL)
{
s_FlashEccFaultStatePtr->otherFaultCallback(fault_stack);
}
}
}

FLASH_ECC_Fault_Handler_C收到PC的地址后,根据Thumb指令集编码规则判断当前指令是16位还是32位,然后修改堆栈中的PC指针,跳过当前读取指令。这样就不会卡死在HardFault中。判断EFM_STS_CI_UNRECOVERR_MASK寄存器,如果是ECC错误则运行eccCallback()函数,其他情景导致的HardFault则运行otherFaultCallback()函数。

需要还需要我们来实现EFM_UnrecoveryErrorHandler和HardFault_Callback函数,这两个函数其实不用自己去实现(Frankie曾工已经给出示例,建议照搬XD
EFM_UnrecoveryErrorHandler中会对出错的扇区进行擦除处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
void EFM_UnrecoveryErrorHandler(void)
{
uint32_t error_addr = EFM->ECCERR_ADDR;
PRINTF("Flash ECC Un-recovery Error Handler.\r\n");
PRINTF("Ecc Error Address(EFM->ECCERR_ADDR): 0x%x\r\n", error_addr);

/* Clear ECC error flag, EFM->ECCERR_ADDR will be cleared at the same time. */
EFM->STS = EFM_STS_UNRECOVERR_MASK;

/* The sector containing the ECC error address MUST be erased to fix the error. */
if (error_addr >= TEST_DATA_START_ADDR && error_addr <= TEST_DATA_END_ADDR)
{
FLASH_DRV_EraseSector(FLASH_INST,
error_addr & ~(FEATURE_EFM_MAIN_ARRAY_SECTOR_SIZE - 1),
FEATURE_EFM_MAIN_ARRAY_SECTOR_SIZE);
PRINTF("Erased the sector containing the ECC error address.\r\n");
}
else
{
// Do nothing for the location of the program
}
}

HardFault_Callback函数如下,如果客户之前有自己实现对HardFaul的处理,可以放到这个部分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void HardFault_Callback(hw_stackframe_t *fault_stack)
{
/* Note: The handling of your own hardfault can be done here.
The following code is just an example of how to access the fault stack. */
/* Note: Debug output inserted just for demo clarity. */
PRINTF("HardFault Callback. Fault Stack:\r\n");
PRINTF("R0: 0x%x\r\n", fault_stack->r0);
PRINTF("R1: 0x%x\r\n", fault_stack->r1);
PRINTF("R2: 0x%x\r\n", fault_stack->r2);
PRINTF("R3: 0x%x\r\n", fault_stack->r3);
PRINTF("R12: 0x%x\r\n", fault_stack->r12);
PRINTF("LR: 0x%x\r\n", fault_stack->lr);
PRINTF("PC: 0x%x\r\n", fault_stack->pc);
PRINTF("PSR: 0x%x\r\n", fault_stack->psr);

/* Reset for default exception handling in HardFault */
PRINTF("System Software Reset.\r\n");
SystemSoftwareReset();
}

常见疑问:
Q1. Pflash和Dflash都会出现ECC错误吗?
A1. Pflash和Dflash都会出现ECC错误,但是Dflash的ECC错误概率更高(擦写相对更频繁)。

Q2. 怎么判断出现了ECC错误?
A2. 如果MCU上电直接跳转硬件错误中断,且EFM->STS的UNRECOVERR位被置起。断电重新上电现象依然存在(出现在Dflash中).使用jflash无法读取
c55e0d2f0c3b6bc05c93c1d24f57987

Q3.Flash ECC 错误产生的原因
A3. 电压波动(特别是上下电过程中进行擦写) 、电磁辐射、器件老化(超Flash使用寿命,flash通常寿命为10万次读写,10年)

Q4. 如果ECC错误的地址刚好是UtilityFlashEccFault库的处理部分,ECC错误处理还会生效吗?
A4. 不能生效。只能上次全擦除整个flash,重新烧录

本站由 提供部署服务