RS485 random disconnect problem

This is where you talk about the NXJ software itself, installation issues, and programming talk.

Moderators: roger, 99jonathan, imaqine

RS485 random disconnect problem

Postby icemadsen » Sat Apr 21, 2012 10:58 am

Hi!

We have been working on optimising the flow of communication in our Sortingmachine (see http://www.youtube.com/watch?v=Bet_QDWNFCY). Instead of only using Blutooth connections between the NXT's, we have exchanged some of the connections to use RS485, in order to avoid switching between Bluetooth connections as it takes alot of time.

However, when the system is running using RS485 connections, we get an IOException when flushing the RS485 Connection. After digging deeply in the source-code, using RConsole, we found that the RS485 connection is being disconnected at line 733 in RS485.java (release 0.9.1) so it seems that a timeout occurs. But it occurs after a random amount of time.

We have been looking at this for some time now, and we are stuck in how to resolve the error.

We have not been able to create a simple test scenario that replicates the error.

Do you have any suggestions on what to do from here? Is this a known error?

Thanks!
Lasse and Kenneth - BrickIt.dk
User avatar
icemadsen
Novice
 
Posts: 31
Joined: Tue Dec 08, 2009 11:41 am

Re: RS485 random disconnect problem

Postby gloomyandy » Sat Apr 21, 2012 2:54 pm

Sorry I don't know of any issues with the RS485 code, however I suspect that this code is not used that much so there may be some sort of problem. The only thing I can suggest is that you may have some sort of flow control issue. To avoid this sort of problem I always try to make the protocols I use self regulating by using a send/response style of communication. So for instance send a message from NXT A to NXT B and A then waits for a response from B before sending any more data to B, this way you do not end up in any kind of situation where B gets flooded with messages from A...

If you can reproduce the problem with some sort of test program I will try and work out what is going wrong, other than that, you may just have to try and add more debug code into the classes to work out what is going on... If you can post a more complete description of how you are using RS485 (how many NXTs you have connected together and how they talk to each other), then I will try and help as much as I can.

Good luck

Andy
User avatar
gloomyandy
leJOS Team Member
 
Posts: 3003
Joined: Fri Sep 28, 2007 2:06 pm
Location: UK

Re: RS485 random disconnect problem

Postby icemadsen » Sat Apr 21, 2012 7:24 pm

Hi Andy

For the sorting system we are using 7 NXT's.
The old communication layout (with only Bluetooth connections) was:
A PC was connected to NXT_A, NXT_B, NXT_C and NXT_D
NXT_A was further connected to NXT_E and NXT_F
NXT_B was further connected to NXT_G

The new communication layout is intended to be:
A PC is connected to NXT_A, NXT_C and NXT_D (Bluetooth)
NXT_C is further connected to NXT_E (RS485)
NXT_D is further connected to NXT_F (RS485)
NXT_A is further connected to NXT_G (RS485)
NXT_G is further connected to NXT_B (Bluetooth)
So the new communication layout consits of 3 "communication chains" from the PC:
PC -> (BT) -> NXT_C -> (RS) -> NXT_E
PC -> (BT) -> NXT_D -> (RS) -> NXT_F
PC -> (BT) -> NXT_A -> (RS) -> NXT_G -> (BT) -> NXT_B

As we ran into loss of data over Bluetooth in the beginning of this project (1½ year ago) and therefore developed a reliable communication protocol on top of the Bluetooth communication. It is basically an implementation of the Reliable transmission part of the TCP protocol. We associate each package with a sequence number upon sending the package and expects an acknowledgement from the receiver for that package before sending the next package. If such an acknowledgement has not been received before a specific amount of time has past, the package is re-send.

The developed communication protocol consists of two threads: a reader thread and a sender thread. The protocol has been working flawlessly over a Bluetooth connection.

We have therefore extended the protocol to be used over an RS485 connection, as this would ease the changes in the software.

For your suggestion on using a send/response style of communication, we may need to do this as it seems as the easiest solution. We are using a RS485 connection on our BrickIt Blimp (the BrickIt Sky Cam), which is based on the send/response communication and there has not been any problems with that.

So it seems it is only when we are sending and receiving at the same time, that the error occurs.

We will try to see if we can get more information on why the error occurs.

- Lasse and Kenneth
User avatar
icemadsen
Novice
 
Posts: 31
Joined: Tue Dec 08, 2009 11:41 am

Re: RS485 random disconnect problem

Postby skoehler » Sat Apr 21, 2012 9:49 pm

RS485 is half duplex. Please ask your self, whether it possible that two NXTs attached to the same RS485 are allowed to send data at the same time.
If that is possible, that you design is wrong. Usually, since RS485 is half-duplex, one of the NXTs is the master while all other NXTs connected to the RS485 link (you can connect multiple NXT to the same RS485 link) are slaves. The slaves only reply to commands from the master, and the master waits for receiving the reply of the slave before he sends another request to a slave. Bluetooth implements a full-duplex link.

Also, RS485 does not implement any flow-control what so ever. So if the sender is sending the data faster than the receiver reads the data, buffer overflows will be the result. I'm not sure whether you're using the version of leJOS in which Andy added the buffer overflow recognition (which results in an IOException). Just make sure, never to send more data than the RS485 buffer of the receiver can hold. I'm not sure how large the buffer is by default. But you can adjust the buffer site in recent version of leJOS. Bluetooth implement flow-control, hence the sender is slowed down when the receiver is too slow.

As you can see, RS485 is much much simpler than Bluetooth. And if you didn't heavily restructure your communication protocols, it is very likely that you're doing something that RS485 do not support.
skoehler
leJOS Team Member
 
Posts: 1110
Joined: Thu Oct 30, 2008 4:54 pm

Re: RS485 random disconnect problem

Postby gloomyandy » Sat Apr 21, 2012 9:55 pm

Hi Sven,
I'm pretty sure they are using the higher level BitBus code, that handles error detection and retransmission. It also has some degree of flow control and buffering, though nothing very sophisticated...

Andy
User avatar
gloomyandy
leJOS Team Member
 
Posts: 3003
Joined: Fri Sep 28, 2007 2:06 pm
Location: UK

Re: RS485 random disconnect problem

Postby skoehler » Sun Apr 22, 2012 12:17 am

gloomyandy wrote:Hi Sven,
I'm pretty sure they are using the higher level BitBus code, that handles error detection and retransmission. It also has some degree of flow control and buffering, though nothing very sophisticated...

The bit where they say that they have 2 Threads per connection sure looks like the way you handle a full-duplex connection. For a half-duplex connection (e.g. using BitBus), you'd probably send a command and wait for the reply on the same Thread. I might be wrong though. I have not seen any code or exception traces in this thread. So I just took a stab in the dark.
skoehler
leJOS Team Member
 
Posts: 1110
Joined: Thu Oct 30, 2008 4:54 pm

Re: RS485 random disconnect problem

Postby gloomyandy » Sun Apr 22, 2012 7:30 am

The line number they reported the exception being thrown at earlier on in the thread is in the BitBus code. When using BitBus you can treat the connection as full duplex, the lower level code takes care of turning it into a half duplex channel by only sending the data to the master when the master polls the slave (which it does in a round robin fashion).

Looking at the code again I can't really see why the connection is being dropped. Even if no user data can be transferred there should be low level acks being sent between the two bricks. The only thing I can think of is that the low level I/O thread is not able to run for long enough/often enough to respond in time. Perhaps increasing the timeout periods may help... Looking at the setup things are pretty complex and I suspect that this code has not been used in such a setup before.

Andy
User avatar
gloomyandy
leJOS Team Member
 
Posts: 3003
Joined: Fri Sep 28, 2007 2:06 pm
Location: UK


Return to NXJ Software

Who is online

Users browsing this forum: No registered users and 1 guest

more stuff