Guidance on the software stack of an autonomous bot

Hi! Sorry, this is quite a long post with quite a lot of questions! I’ve been slowly working on a VR heritage preservation project for the past few years, and photogrammetry has been a central aspect in acquiring 3D models of these heritage sites.

However, taking photos for photogrammetry is a slow and painful process, and so recently I along with a few friends decided to build a robotics platform that would autonomously navigate interior spaces, and allow external cameras to be connected for a high-quality reconstruction as a post-production step.

The current state of development [picture]
I have 0 robotics experience, but we’ve managed to figure out the hardware and get it controllable (without ROS) for now. We’ve settled with:

  • An extruded aluminum frame
  • 4 wheel omni wheel drive
  • DC motors with encoders as actuators
  • Arduino for motor control
  • Depth (Orbbec Astra) and LIDAR (RPLIDAR A1) sensors for SLAM
  • RPi 4B for ROS2
  • LiPo batteries

For now, I’ve tried reading the official ROS2 and its packages’ documentation, but I find that the information they provide are rather fragmented for beginners, and I can’t build a mental overview on what package is used for what, and how they are connected in a node graph.

We have so far managed to have the hardware all set up, and have the Arduino take in button input to control the wheels in preset modes using a rough open-loop control. The next step would be to have the bot remote controllable using ROS2, and implement closed-loop control. However, there are so many more questions yet to be answered:

  • How would the motion control be done with ROS2?
    As I understand, the navigation stack that we’ll later use will command the vehicle movement using Twist messages on /cmd_vel. So, how do I go from /cmd_vel to wheel movement?

    • What is the process of interpreting robot movement to individual motor movement called in this context? Would this be classified as inverse kinematics?

    • How will the control system be distributed from the RPi to Arduino? A few thoughts:

      • RPi interprets the cmd_vel to individual wheel RPM and receive the encoder ticks, and have the ROS node do all the control system logic and change the duty cycle of wheels
      • RPi interprets the cmd_vel to individual wheel RPM, and send targeted wheel angular velocity to the Arduino, which runs its own velocity PID control system to try and reach the targeted omega for each wheel
      • RPi forwards /cmd_vel to the Arduino, which calculates on its own individual wheel omegas and using closed loop control to reach each of the wheel’s target speed
    • Also, wouldn’t using any of the above control system solutions lead to TWO control systems in the end? One for reaching wheel velocity, and one later one to reach a goal location by modifying the /cmd_vel of the robot? Is this how its normally done, or is this redundant?

    • Can you give a rough idea of how I can get this setup (of simulating our motor controllers) working on Gazebo?

Would appreciate any tips and pointers. Thanks a lot!

Welcome to the community! Those are some great questions - I’ll have a crack at answering them but if anyone else has something to add, go for it :slight_smile:

Very interesting project!

Yep, the standard ROS approach would be to use Twist ( or possibly TwistStamped) messages, and this will integrate nicely with libraries like Nav2 and teleop tools.

Ooh big question. Many aspects of this will be outlined in the answers below, but one general thing to touch on is the overall structure to the solution. The most straightforward way will be to write a single ROS node that takes the /cmd_vel in and sends something to the Arduino (depends on the answers to your questions below), and that’s probably a great way to start.

An alternative is to use the ros2_control framwork. The learning curve is fairly steep, but it’s a powerful tool. My next two videos will be on it, so maybe keep an eye out for them. With this framework, you write a controller which converts from the /cmd_vel to the velocities of some abstract wheels, and also a hardware interface which takes the abstract wheel velocities and sends the signals appropriately to the hardware. This makes your code much more modular - you can share it with others and also make future changes more easily.

In the past I’ve heard this referred to as control allocation or otherwise just part of the controller. I wouldn’t really class it as inverse kinematics, which typically describes the process of calculating the parameters (e.g. angles) of individual joints in a chain (such as a robot arm) to locate the end effector in the desired location. Given that you are using omni-wheels you’ll probably have to implement the control allocation (from Twist to wheel vels) yourself.

All of these are theoretically possible options. I’d steer clear of number one as you do NOT want your comms interfering with your PID loop. Number three is totally viable and may be preferable in some situations, but I personally would go for number two.

The second approach keeps the “low-level” stuff on the Arduino side and the higher-level stuff on the Pi side. This may help with reusability:

  • In the Arduino code (e.g. if you swapped to a different control scheme like differential drive or Ackermann, the Arduino code can stay the same)
  • In the ROS code (e.g. if you built a new platform that used a different motor driver/controller, no changes required to your control allocation).

This modularity ties in with the ros2_control stuff from earlier (but the same concept applies even without ros2_control).

For what it’s worth, in the robot I’m building on the YouTube videos a slightly odd approach is taken due to the Arduino code I’m using (which I found online). In this case the Pi goes as far as to calculate and send encoder clicks per Arduino PID loop, which the Arduino then takes as a reference. This means the Arduino code doesn’t need to even know about the encoder CPR (but the Pi needs to know about the Arduino’s loop rate…). Instead you could draw the line at encoder cts per second, or a normal unit such as rad/s or RPM. I think I’d probably go rad/s or rev/s and have a way to tell the Arduino what the encoder CPR was (either via serial or hardcode it).

Correct, and I’d say that’s a fairly normal solution. I think the easiest way to see that it is not redundant is that we are controlling two different things. The inner loop is controlling the wheel velocity and the outer loop is controlling the robot position - the wheels themselves are somewhat abstracted away. If the outer loop was attempting to control the wheel position while the inner loop was controlling their velocity, there would be some redundancy. But even full control of the wheel position is not enough to control the robot position - if we used a position controller to turn the left wheel a full revolution, and then the right wheel that would locate the robot differently to turning the right wheel before the left, but the controller would not understand the difference.
So I see that as a totally sensible solution.

That’s a tricky one. I actually have no experience with omni-wheels (just never got around to it, always meant to), and so definitely no experience using them in Gazebo. I did some very quick searching and the results I saw backed up the three approaches that came to mind:

  • Fully model the wheels in Gazebo, with little rolling cylinders
  • Utilise the Gazebo friction model with standard cylindrical wheels to zero out the friction coefficient on one axis and set it appropriately on the other (I’d try this first to see if it can work)
  • Create a Gazebo model that can move arbitrarily in the plane (not 100% sure how to do this) and either drive it directly from your /cmd_vel, or (better) add a fake “omni model” layer that takes the output of your omni controller and maps it appropriately, so you are at least testing your controller even if the sim is a bit fake.

Apparently this GitHub repo might be worth taking a look at. See also this and this.

Actually controlling the wheels in Gazebo is another matter. You could write your own plugin, but I’d urge you to consider using ros2_control. As I mentioned earlier, the learning curve is a bit steep but the final result should work quite nicely as you’ll keep your controller the same, but use the Gazebo hardware interface instead of your own.


Long answer to a long question but I hope it’s helpful - it was a great first big question for the forum! Feel free to ask for clarification on any of those points or other parts of the project, and if anyone else has some tips hopefully they can add them below.

Good luck with the project - I’m keen to see how it turns out :smiley:

Thanks for the quick and detailed answer! I think I will opt for going without ros2_control for now since I have already derived some equations for wheel movement and want to see if it works. I do see the need for ros2_control though and I will pick it up in parallel.

We’ve actually already setup the model for Gazebo based on your videos, and we used the friction-based approach and it does work totally fine. The question really is in how we control the virtual wheels, but if it involves writing a custom plugin then I’d rather wait until I start the ros2_control rewrite!

I do have some more questions on SLAM and navigation, more of out of curiosity since that’s a lot of steps ahead - our robot might have to traverse tight interior spaces and given its size and especially height, an RPLIDAR is not going to cut it for obstacles that aren’t visible on the ground / for height limit issues.

Optimally, we could have a 2D LIDAR for a rough 360 overview of the scene and a 3D depth sensor for building a better representation for nearby obstacles. Perhaps it would be even better to have multiple depth sensors for better coverage, but I’m not sure if the path planning nodes out there will take in arbritrary combinations of these sensors.

Also I can only hope that there are some automatic sensor calibration to find its extrinsics, finding the position of the sensor manually sounds quite painful.

Anyway for the motion control I’ll work on it a bit more first and looking forward to the upcoming videos :slight_smile:

Yeah I think you do need to write a custom plugin (though I may be wrong). Here is the source for the bundled ones as a reference if you want to have a crack, but as you say it may be better to wait.

In the meantime you might be able to make something “good enough” using the provided plugins. For example, you can probably use the planar move plugin to completely bypass the wheel control and dynamics altogether (converts a Twist directly into movement). This would let you start experimenting with SLAM and stuff in Gazebo, but obviously misses out on testing your control scheme. You may also be able to exert forces on the wheels etc.

Re the SLAM stuff, I’ll be covering that at some point in the series but that will probably be a couple months away at this rate. I agree with your plan of having a 2D lidar + depth camera (or similar), and you’re right, integrating that into a navigation stack will be more complex, but it’s my understanding that Nav2 (which I presume you’ll use) can handle it, it just might take a bit more setting up compared to simply inflating a SLAM gridmap. I think you could either fuse all the information into a single costmap, or create multiple layers per sensor type.

Sensor calibration is definitely a tricky one, I’m not aware of any automatic processes but I’m sure they’re out there (I do recall this talk, not sure if it’s helpful or not).

Heng, welcome to the community and I love the though you put into the write up. Josh, your thoughtful and detailed answers will not only help Heng, but me and others who have goals of doing similar things.

I’m still dealing w Long covid so I’m not able to do much but I will follow the threads and contribute if or when I can.

Eric