Skip to content

STM32: Improve speed of uart async implementation #34763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hjuul opened this issue May 3, 2021 · 21 comments
Open

STM32: Improve speed of uart async implementation #34763

hjuul opened this issue May 3, 2021 · 21 comments
Assignees
Labels
area: DMA Direct Memory Access area: UART Universal Asynchronous Receiver-Transmitter Enhancement Changes/Updates/Additions to existing features platform: STM32 ST Micro STM32

Comments

@hjuul
Copy link
Contributor

hjuul commented May 3, 2021

I am trying to use the UART async API on stm32f7. It works nicely up to about 460.8kbps, but above that I start loosing characters during buffer switch. I immediately provide new buffers when requested, and increasing the buffer size doesn't help. So this appears to happen when new data arrives before my RX handler is done.

Describe the solution you'd like
I currently have my own implementation of this (without UART async API) where I manually configure the STM32 DMA to use circular mode and set up interrupts on Half-transfer, Transfer Complete and UART RX IDLE. It can sustain data rates far above 1Mbps even with very small buffers.

So I would like the STM32 implementation of the UART async API to use either the circular mode or the double-buffer mode of the DMA peripheral (if available). Then DMA hardware would just continue to receive UART characters regardless of how long time I spend in my RX event handler.

Additional context
I'm on Zephyr v2.5.0

@hjuul hjuul added the Enhancement Changes/Updates/Additions to existing features label May 3, 2021
@erwango erwango added area: DMA Direct Memory Access platform: STM32 ST Micro STM32 labels May 3, 2021
@erwango erwango self-assigned this May 3, 2021
@erwango
Copy link
Member

erwango commented May 3, 2021

^^@shlomow

@shlomow
Copy link
Contributor

shlomow commented May 3, 2021

We implemented our uart driver in the past with circular buffer before we used zephyr. It has some drawbacks. First, if you need hardware flow control then you have to control the rts line in software instead of hardware, and this is a little bit hard since the driver needs to know how much data the application has already read. In addition, if you want to implement the async api then you need to have a dedicated buffer in the driver, which consumes extra ram, and needs an extra memcpy from the uart driver to the application buffers.

In terms of speed, I actually think that the current implementation is better, since you need an extra handling of the flow control in each interrupt (idle, half transfer, full transfer).
As a principle, for high data rates we use hardware flow control and don't rely on driver real time issues.

The above is relevant only for the circular buffer implementation.

As you said, when the dma controller in st has double buffer mode or linked list mode (like mdma in h7) we can easily take advantage of that and the changes in the uart driver should be minor.

So to sum up, I think that the real enhancement is to dma-v1 driver, to support double buffer mode. This can be used also in i2s driver and everything that is "streaming" in its nature like adc, dac, uart, i2s, sai, etc.

@manoj153
Copy link
Contributor

manoj153 commented Jun 5, 2021

@hjuul you mentioned enabling IRQ, you done it in application code or directly on zephyr-driver source,
if you don't mind can you share a snippet?

@hjuul
Copy link
Contributor Author

hjuul commented Jun 7, 2021

@manoj153 I did this in application code. I've scraped together some pieces of it below. As you can see it is a combination of Zephyr API and bare metal for the UART, bare metal for DMA and Zephyr API for interrupt handlers. I've since rewritten this to use Zephyr async uart API which is a lot easier, but also slower.

#define DMA_STREAM_MAP_TX DMA2_Stream7
#define DMA_STREAM_MAP_RX DMA2_Stream5
#define DMA_IRQ_TX DT_IRQ_BY_IDX(DMA_NODE_ID, 7, irq)
#define DMA_IRQ_TX_PRIORITY DT_IRQ_BY_IDX(DMA_NODE_ID, 7, priority)
#define DMA_IRQ_RX DT_IRQ_BY_IDX(DMA_NODE_ID, 5, irq)
#define DMA_IRQ_RX_PRIORITY DT_IRQ_BY_IDX(DMA_NODE_ID, 5, priority)

static int dma_init(void)
{
	/* Map DMA and UART register structs */
	DMA_TypeDef *dma = (DMA_TypeDef *)DMA_BASE_ADDR;
	DMA_Stream_TypeDef *stream_tx = DMA_STREAM_MAP_TX;
	DMA_Stream_TypeDef *stream_rx = DMA_STREAM_MAP_RX;
	USART_TypeDef *uart = (USART_TypeDef *)UART_BASE_ADDR;

	/* Enable DMA clock */
	const struct device *clk = device_get_binding(STM32_CLOCK_CONTROL_NAME);
	struct stm32_pclken pclken = {
		.bus = DT_CLOCKS_CELL(DMA_NODE_ID, bus),
		.enr = DT_CLOCKS_CELL(DMA_NODE_ID, bits),
	};

	if (0 != clock_control_on(clk, (clock_control_subsys_t *)&pclken)) {
		LOG_ERR("Failed to enable DMA clock");
		return -EIO;
	}

	/*  Verify initial status of DMA */
	ASSERT((stream_tx->CR & DMA_SxCR_EN) == 0, "DMA already enabled");
	ASSERT((stream_rx->CR & DMA_SxCR_EN) == 0, "DMA already enabled");
	ASSERT(dma->LISR == 0, "LISR not zero");
	ASSERT(dma->HISR == 0, "HISR not zero");

	/* Register interrupt service routine for TX and RX DMA streams */
	IRQ_CONNECT(DMA_IRQ_TX, DMA_IRQ_TX_PRIORITY, dma_irq_handler_tx, 0, 0);
	irq_enable(DMA_IRQ_TX);
	IRQ_CONNECT(DMA_IRQ_RX, DMA_IRQ_RX_PRIORITY, dma_irq_handler_rx, 0, 0);
	irq_enable(DMA_IRQ_RX);

	/*  Configure DMA tx stream, but do not enable it.
	   Circular mode disabled, interrupt on TC only. */
	stream_tx->PAR = (uint32_t) &(uart->TDR);
	stream_tx->M0AR = (uint32_t)dma_tx_src_buf;
	stream_tx->CR |= (DMA_CHANNEL_USART_TX << DMA_SxCR_CHSEL_Pos) |
			 (1 << DMA_SxCR_PL_Pos) | DMA_SxCR_MINC |
			 (1 << DMA_SxCR_DIR_Pos) | DMA_SxCR_TCIE |
			 DMA_SxCR_TEIE | DMA_SxCR_DMEIE;

	/*  Configure and enable DMA rx stream.
	   Circular mode, interrupts on HT and TC. */
	stream_rx->PAR = (uint32_t) &(uart->RDR);
	stream_rx->M0AR = (uint32_t)dma_rx_dest_buf;
	stream_rx->NDTR = DMA_RX_BUFFER_SIZE;
	stream_rx->CR |= (DMA_CHANNEL_USART_RX << DMA_SxCR_CHSEL_Pos |
			  (1 << DMA_SxCR_PL_Pos) | DMA_SxCR_MINC |
			  DMA_SxCR_CIRC | (0 << DMA_SxCR_DIR_Pos) |
			  DMA_SxCR_TCIE | DMA_SxCR_HTIE | DMA_SxCR_EN);

	/*  Give semaphore to indicate DMA ready */
	k_sem_give(&dma_tx_sem);

	return 0;
}


static int myinit(void)
{
	const struct device *uart_dev = NULL;
	int ret = 0;

	uart_dev = device_get_binding(UART_LABEL);
	struct uart_config cfg = {
		.baudrate = UART_BAUD_RATE,
		.parity = UART_CFG_PARITY_NONE,
		.stop_bits = UART_CFG_STOP_BITS_1,
		.data_bits = UART_CFG_DATA_BITS_8,
		.flow_ctrl = UART_CFG_FLOW_CTRL_NONE,
	};
	ret = uart_configure(uart_dev, &cfg);
	//...
	ret = dma_init();
	//...

	/* Configure UART to enable DMA on transmitter and receiver */
	USART_TypeDef *uart_inst = (USART_TypeDef *)UART_BASE_ADDR;
	uart_inst->CR3 |= USART_CR3_DMAT;
	uart_inst->CR3 |= USART_CR3_DMAR;

	/*  Enable interrupt on RX IDLE */
	uart_inst->ICR |= USART_ICR_IDLECF;
	uart_inst->CR1 |= USART_CR1_IDLEIE;
	while ((uart_inst->ISR & USART_ISR_IDLE) != 0) {
	}

	IRQ_CONNECT(UART_IRQ, UART_IRQ_PRIORITY, uart_irq_handler, 0, 0);
	irq_enable(UART_IRQ);

	return ret
}

@erwango erwango assigned FRASTM and unassigned erwango Jun 7, 2021
@erwango erwango added this to the v2.7.0 milestone Jun 7, 2021
@cfriedt cfriedt added priority: low Low impact/importance bug and removed priority: low Low impact/importance bug labels Jul 12, 2021
@JonathanVanWin
Copy link
Contributor

What is the state of this issue?
Is somebody working on the double buffer mode?

@erwango
Copy link
Member

erwango commented Jul 13, 2021

@JonathanVanWin this is in our plans for this release if everything goes well (no excessive support requests aside), but not started so far. @FRASTM is out of office this week but will be back next week. You can sync with him at that time.

@JonathanVanWin
Copy link
Contributor

@FRASTM I would be happy to work together in order to make this work.
Should we continue communicating through this issue?
I already started checking what to do in order to improve the speed, so we should get in contact ASAP.

@roland-xz
Copy link

roland-xz commented Aug 24, 2021

I too am facing the same problem.
I'm using STM32F105 on 0d42e75 (v2.6.0+), and got a customer requirement to support 1Mbps.
Using the async api and a basically empty RX handler, with the double buffer swapping, bytes start dropping beyond 576000 baud. A non-empty RX handler will make it not so much better than the interrupt driven mode.
This is too low for my application, and I may have to go down the out of tree driver path.

NB. No hardware flow control because of hw design constraints.

@FRASTM
Copy link
Collaborator

FRASTM commented Aug 24, 2021

@JonathanVanWin, sorry for late reply. Now going back to double buffer circ. mode.
I have begun to change the dma driver in that way.
We can go on with stm32F1 serie.

@JonathanVanWin
Copy link
Contributor

Thanks, I have since then worked on it.
Got to 4MBaud with circular buffer, and software flow control. The buffer is given once to the driver, and then the read and write pointers are handled by the application. The async callback has now a new event to ask if it can continue writing, then lowers RTS.
I have written nice documentation about the software flow control handling, and can be added to a zephyr sample to show how to use the uart async circular buffer for 4 Mbaud.
This way we don't need to copy directly from buffer, because the read/write pointer handling.

@FRASTM
Copy link
Collaborator

FRASTM commented Aug 24, 2021

Good !
my concern was on the right size of the dma_memory buffer (the ones used to switch in a circ. mode). what should it be ?
I guess you have 2 buffers of the same size. Starting with buffer1, then in the irq, you switch to buffer2 by changing pointer. This is quite fast and then you are shiting new address by the adding buffer size. etc.
However the application has its own Rx buffer of its own size, so each double buffer is memcopy on its turn, to the application Rx buffer ?

@JonathanVanWin
Copy link
Contributor

I actually only use one buffer. Don't use the rx_buf_rsp API. Only use the buffer given to rx_enable.
Because I use circular DMA, I'm all the time reading from the uart data register, therefore we need to handle software flow control (also the stm32f411 has an errata with hardware fc that doesn't work properly if we brake the protocol).
Therefore I added to the uart bindings a new member: gpios, to store the RTS pin, and leave the CTS to be handled by hardware.
So now there is a new event, that is called every time after rx_rdy that checks if can continue writing to buffer, and here comes the tricky part:
You need to handle the read and write pointers externally, and if first half and second half of buffer is empty, and then check if the dma can continue writing to the buffer (without overwriting unread data). Because it takes time to pull the RTS pin high (till you get to that moment, you have passed from the dma callback, to uart callback to user callback), you need to calculate that delay (e.g. use logic analyzer or test different values).
And here comes the question about how to choose the buffer size:
It must be two times the size of the bytes that would be written in that delay (that changes when you change baudrate, so higher baudrate means bigger buffer, because you get more bytes in the same gpio delay). I can explain more in depth all the reasoning, but because I've already a working version with also documentation about the buffer handling (very nice, with ASCII art to explain the pointers), I will probably open a pull request when I have time.
Also, once the RTS pin is pulled high, I start a timer (the period is given in the Kconfig.stm32), and then starts sending events to user asking if can continue writing (read from buffer enough to not overwrite for next round).

@JonathanVanWin
Copy link
Contributor

Also, wanted to clarify, there is no need for external memcpy from application, that would be less convenient: because you have software flow control for RTS, and handle read write pointers, you can read from same buffer and process from there, and when you finished you can signal the driver.
We are testing the driver, and looks that for now it handles 4MBaud. Will do more stress testing, but I'm happy with the result.

@cfriedt cfriedt removed this from the v2.7.0 milestone Sep 28, 2021
@erwango
Copy link
Member

erwango commented Mar 21, 2022

For reader's information: No plan to implement this in STM32 driver before an UART API update is available.

@erwango erwango added the area: UART Universal Asynchronous Receiver-Transmitter label Aug 24, 2022
@durgababu2010
Copy link

@hjuul @cfriedt @shlomow @manoj153 @erwango

Title: Intermittent Data Loss in UART Async API on STM32H753ZI (Nucleo H753ZI)
Issue Description:

I am working with the STM32H753ZI (Nucleo H753ZI) board and using the UART asynchronous API provided by Zephyr. While I am able to receive data from the host, I am encountering an intermittent issue where the received data is not correctly processed. Occasionally, when reading the buffer to extract the packet, I observe that the data contains zeros, indicating a loss or misinterpretation of the received data.

Environment:
Board: STM32H753ZI (Nucleo H753ZI)
Zephyr Version: [3.5.0]
Toolchain: [default]

Followed Steps:

  1. Set up UART to receive data asynchronously and enabling the seamless transfer
  2. Enabling the callback
  3. sent the data from the host pc to board
  4. In the rx_rdy handler, copy the data to a ring buffer and signal a semaphore to notify a packet processing thread.
  5. The packet processing thread then reads from this buffer.

Expected Behavior:
The data read from the buffer should be consistent and accurate, reflecting the data sent from the host.

Observed Behavior:
The received data intermittently contains zeros, suggesting data loss or corruption. This happens randomly; sometimes the data is received and processed successfully, but other times it fails. The issue persists even though I am using the volatile qualifier.

I am seeking assistance in understanding and resolving this intermittent data loss issue with the UART async API on the STM32H753ZI board. Any insights or suggestions to debug this problem would be greatly appreciated.

@erwango
Copy link
Member

erwango commented Jan 11, 2024

Title: Intermittent Data Loss in UART Async API on STM32H753ZI (Nucleo H753ZI

Please open a dedicated issue

@cfriedt
Copy link
Member

cfriedt commented Jan 11, 2024

@hjuul - is there a way you could also provide reproducibility information with a specific app or test in-tree?

E.g. west build -b <board> tests/drivers/uart/...

@hjuul
Copy link
Contributor Author

hjuul commented Jan 12, 2024

@cfriedt: Are you referring to my original issue or the issue recently posted by durgababu2010 in this same thread?

As for my original issue it's been almost three years since I reported it; I have worked around it and moved on. But it seems from the discussion here that this issue is very well understood and has already been reproduced.

@durgababu2010
Copy link

@hjuul - is there a way you could also provide reproducibility information with a specific app or test in-tree?

E.g. west build -b <board> tests/drivers/uart/...

Intermittent Data Loss in UART Async API on STM32H753ZI (Nucleo H753ZI) ARM #67546
#67546

@erwango
Copy link
Member

erwango commented Dec 5, 2024

@hjuul Would you want to have a test to #80921 ?

@hjuul
Copy link
Contributor Author

hjuul commented Jan 3, 2025

@erwango Sorry, I can't prioritize this right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: DMA Direct Memory Access area: UART Universal Asynchronous Receiver-Transmitter Enhancement Changes/Updates/Additions to existing features platform: STM32 ST Micro STM32
Projects
None yet
Development

No branches or pull requests

9 participants