This article presents what is meant to be the simplest possible example of using the PRU (programmable realtime unit) on the BeagleBone Black single-board computer. The example program has no inputs and no outputs; it does nothing other than delay for a fixed duration then exit. Read on after the jump...

  • What is the BeagleBone Black? It's a tiny single-board computer that's usually used to run Linux. Surprisingly capable for the US$45 price tag, it's got lots of general-purpose IO, HDMI video output and on-board flash storage. Lots of specs and additional information can be had from the manufacturer's website.
  • What's a PRU? The PRU (programmable realtime unit, also known as the PRU-ICSS or PRUSSv2) is a subsystem of the AM335x ARM Cortex-A8 processor on the 'bone. It is an independent CPU with its own memory and instruction set. It can run its own program, completely independent of the Linux kernel on the main CPU. It's fast (200MHz clock), all the instructions take known constant times and you have it all to yourself, so you can use it for things that require a hard realtime response. The 'bone has two PRUs.

If you are just getting started with the BeagleBone Black, the information here is not going to be terribly helpful -- this is not a fun or instructive first project.

However, if you're running into a situation where just banging bits into the GPIO registers using memory-mapped IO is too slow or too imprecise for the particular hardware you're trying to talk to, then using the PRU might solve your problem without requiring extra off-board hardware.

There's lots of great information about using the PRU available on the 'net -- please see the References section, below. However, in a lot of cases it's overload. Many of the examples are doing something fairly complicated. Some of the information you need is buried in a 20MB processor reference document, or in the middle of a long forum thread.

The goals here are:

  • Make simple stuff simple. I want to have all the information I need to start working with the PRU, organized in a reasonable way, in one place. And who knows, maybe it'll help somebody else.
  • Get a development environment set up and in a known-working state. Working with the PRU is hard enough without having to worry about problems with your tools or fundamental approach.
  • See a result (even if it isn't one that's practical or especially exciting). I've previously mentioned "Fail early, fail often." It boils down to wanting to learn about problems early before I invest lots of time in an approach that'll never work. That means being able to see something real happen at lots of intermediate steps on the way to a goal.

I'm specifically and deliberately avoiding dealing with any of the device-tree stuff you'll need to worry about if you want to do real work with the PRU. (But I do intend to cover that in a future post.)

Advantages and Disadvantages

One of the main points I'd like to make up front is that using the PRU -- even in a simple way -- is kind of a butt-pain. If you can avoid it, you probably should (unless you just want to learn, in which case: go for it!). For most projects, the PRU is overkill. If you're just trying to flash an LED or measure the temperature or something, there are much easier ways to do it. If you're thinking about adding an off-board microcontroller (or FPGA) because you can't get tight enough timing otherwise, using the PRU just might be a better alternative.

What do you gain by using the PRU?

  • speed -- You get an entire processor to dedicate to a single task; it isn't just getting time slices in a premptive multitasking system.
  • predictability -- The PRU instruction set is relatively small, every instruction takes a known, fixed number of clock cycles, and you're not going to swap context or wait for memory paging in the middle of your delicately-tuned timing loop.
  • offloading -- If you can do something CPU-intensive on the PRU, it's not going to be bogging down the Linux userland and making interactive performance sluggish.
  • fast IO -- Many of the pins have special IO modes for direct access by the PRU. These work much faster than memory-mapped IO from the main processor.

What does using the PRU cost you?

  •  no GCC -- Update: TI has made available a PRU C compiler as part of their "Code Composer Studio" (CCS) IDE. However, it is not free as in speech, only free as in beer for limited use, requires registration, and is ITAR regulated. The registration form expects you to be a company or university, not an individual. They also want to know details of what you're using it for. It is possible to download just the TI PRU tools, without having to download CCS or use the CCS GUI. There's a great writeup of how to actually use the TI compiler from fabien le mentec at EmbeddedRelated.com.
  • extra development overhead -- You'll need to install some additional tools, headers and libraries, above and beyond what you need for normal userland software development.
  • hardware resources -- The 'bone only has two PRUs, and they can't be virtualized (at least not without sacrificing all the advantages of using a PRU in the first place).
  • security -- To load code into the PRU from user space, you need to be root. While you can certainly develop secure applications that use the PRU, it requires careful thought. There's no kernel enforcing the rules in PRU-land.
  • learning curve -- Presumably, the whole reason you're reading this page, nyet?

Assumptions

I'm going to assume that the reader is an experienced Linux user and C programmer.

The Ångström distro gives me hives. Just trying to figure out how to set a static IP with connman made we want to punch a baby. Busybox is a great piece of software, but if I'm using it, it's because I have to, not because I want to. So: I've put Debian one my 'bone, and some of my instructions are Debian-specific (like using apt-get to install things). If you use something else, that's fine, but you'll have to adapt what's here to your particular system.

All of my examples involve developing natively on the BeagleBone Black itself. Cross-compiling on your big desktop system is certainly an option, but it adds an extra layer of complexity and the 'bone is fast enough that it isn't really necessary.

You'll need root. You may need physical access to the 'bone (to power cycle it if you screw up badly enough to wedge it -- unlikely but possible).

Finally, I assume that you have a basic native development environment on the 'bone, you can compile and link C programs and that you have things like "make" and a reasonable text editor handy.

Development Tools

In order to write programs that use the PRU, you'll need to install the pasm assembler, the libprussdrv library and the header files for that library. While there isn't (at the time of this writing) a convenient Debian library you can install with apt-get, it isn't terribly hard:

  1. Get a copy of the am335x_pru_package software from the GitHub repository. There are various ways to do this, but the simplest is probably to use wget from the command line to download the package tree as a .zip file from https://github.com/beagleboard/am335x_pru_package/archive/master.zip. Note: If wget complains about problems checking the site certificate, it is likely because you don't have the root certificates installed. You can either use the --no-check-certificate option or (much better) just: sudo apt-get install ca-certificates
  2. If you downloaded the archive (as opposed to cloning the git tree), unpack it somewhere under your home directory.
  3. Make a new directory /usr/include/pruss/ and copy the files prussdrv.h  and pruss_intc_mapping.h into it (from am335x_pru_package-master/pru_sw/app_loader/include). Check the permissions; if you used the .zip file, these headers will likely have the execute bits on. It doesn't really hurt anything, but is certainly not what you want.
  4. Change directory to am335x_pru_package-master/pru_sw/app_loader/interface then run: CROSS_COMPILE= make (note the space between the = and the command).
  5. The previous step should have created four files in am335x_pru_package-master/pru_sw/app_loader/lib: libprussdrv.a, libprussdrvd.a, libprussdrvd.so and libprussdrv.so. Copy these all to /usr/lib then run ldconfig.
  6. Change directory to am335x_pru_package-master/pru_sw/utils/pasm_source then run source linuxbuild to create a pasm executable one directory level up. Copy it to /usr/bin and make sure you can run it. If you invoke it with no arguments, you should get a usage statement.

There are certainly other places you can install stuff if you feel strongly about it. Using /usr/bin, /usr/lib and /usr/include is simple and works.

I suggest you keep the tree where you unpacked the am335x_pru_package; it contains documentation and example code you'll probably want to look at later.

Enable the PRU

Before we can use the PRU, we need to enable it, in much the same manner as we would for a UART or a GPIO pin. This means adding it to the device tree. Fortunately, we can use a device-tree fragment that's already been created for us.

Example command (to be run as root):

echo BB-BONE-PRU-01 >/sys/devices/bone_capemgr.9/slots

This should complete without any error messages. If you cat(1) the slots file, you should see an entry like:

8: ff:P-O-L Override Board Name,00A0,Override Manuf,BB-BONE-PRU-01

(Your result may show a different slot number; I have a couple other unrelated override boards loaded on my 'bone.)

The output of lsmod should show the uio_pruss module loaded. (This module is a tiny driver that lets us talk to the PRU from user-space. It's used by the libprussdrv library we installed earlier.)

The dmesg output should show something like:

bone-capemgr bone_capemgr.9: part_number 'BB-BONE-PRU-01', version 'N/A'
bone-capemgr bone_capemgr.9: slot #8: generic override
bone-capemgr bone_capemgr.9: bone: Using override eeprom data at slot 8
bone-capemgr bone_capemgr.9: slot #8: 'Override Board Name,00A0,Override Manuf,BB-BONE-PRU-01'
bone-capemgr bone_capemgr.9: slot #8: Requesting part number/version based 'BB-BONE-PRU-01-00A0.dtbo
bone-capemgr bone_capemgr.9: slot #8: Requesting firmware 'BB-BONE-PRU-01-00A0.dtbo' for board-name 'Override Board Name', version '00A0'
bone-capemgr bone_capemgr.9: slot #8: dtbo 'BB-BONE-PRU-01-00A0.dtbo' loaded; converting to live tree
bone-capemgr bone_capemgr.9: slot #8: #2 overlays
omap_hwmod: pruss: failed to hardreset
bone-capemgr bone_capemgr.9: slot #8: Applied #2 overlays.

I'm not sure what's going on with the "failed to hardreset" message, but stuff seems to work anyway...

Create the PRU Program

My goal for the PRU equivalent of the "Hello, world" program was to create the simplest possible program that still produces some kind of visible evidence that it's working. (That excludes a program that just immediately halts; how would you know it ran at all?)

Most of the options (like manipulating IO pins or communicating back to the host processor) involve extra complexity that I didn't want for this initial attempt -- mostly, dealing with the device tree in a non-trivial way. I settled on creating a program that would busy-loop for a fixed length of time, then exit.

You can download all my example code in a single archive, but here is the PRU assembly source if you want to take a quick look in your browser: example.p

Some additional explanation of the assembly source:

The source is written for the TI-provided "pasm" macro assembler. The authoritative documentation for this is in Section 5.3 of the AM335x PRU-ICSS Reference Guide (1.3MB PDF). This document explains the command-line arguments, input syntax and processor instruction set.

The .origin directive tells where the code is loaded into the PRU memory. (The PRU has its own 8KB instruction memory. I don't know of any reason you'd use another .origin for the entry point of your code.)

The .entrypoint directive is only for the debugger. Changing this won't change where the PRU starts executing instructions once your program is loaded.

The only thing my example program does that's even slightly tricky is the way it signals the host processor to indicate that it has finished. The register r31 is magic; if you write a '1' into bit 5, simultaneously with some value into bits 0-3, then an event will be sent to the host (by way of some excessively complicated mapping in the interrupt controller aka INTC).

The reference guide section 5.2.2.2 talks about how r31 works in this regard, while the INTC is discussed in sections 6 and 7. It's admittedly some heavy reading; for the purposes of getting started this may be something you can take on faith.

The last instruction halts the PRU. Without this, it would happily continue into whatever happened to be in instruction memory after the end of your program.

Create the Host-side Program

To load our program into the PRU, we'll use a program that runs in Linux user-space on the host CPU, and interfaces with a TI-supplied kernel module and library. A quick link to the example program: example.c

A minimal outline of what the program does:

  • initialize the library
  • set up the interrupt we want to use
  • load example.bin  from the filesystem into PRU instruction memory and start the PRU
  • wait until the PRU asserts the interrupt, telling us the program has completed
  • clean up

Most of those steps are reducible to one (or a few) calls into library functions that are part of the TI-supplied libprussdrv library.

The library functions are documented in the AM335x PRU Linux Application Loader User Guide document (632KB PDF). Example code is in the pru_sw/example_apps/ directory within the am335x_pru_package software. (However, the examples are doing more complicated things than I am here.)

If you're really curious about what the library is doing internally, it comes with source -- in the pru_sw/app_loader/interface/ directory within the am335x_pru_package stuff.

Build and Test

Update: An archive containing the complete example including a Makefile is available for download: pru-helloworld.tar.gz (4KB).

To assemble the example.p into example.bin and compile and link example.c into example, just make using the supplied Makefile. There's nothing complicated going on there.

To run, invoke the example program (as root):

sudo ./example

This should take approximately five seconds to run, and produce output like:

waiting for interrupt from PRU0...
PRU program completed, event number 1

(The first line should appear almost immediately; the second, after the delay.) If it returns right away, stalls forever, or produces some kind of error message, then there's a problem. The "event number" you see in the second line may differ; it will increase with each run, and reset on reboot.

References

I never would have known about the PRU in the first place, much less ever figured out how to use it, without the help of a number of other information sources. For further reading:

  • Element 14 Forum Thread: BBB - Working with the PRU-ICSS/PRUSSv2 -- extremely valuable, and where I got started. Suffers a little for having to sift through a lot of volume (including some outdated stuff) to find what you need.
  • AM335x PRU-ICSS Reference Guide (1.3MB PDF) -- definitive and comprehensive, but not as helpful as some other sources when you're first getting started. Includes vital documentation of pasm assembler syntax and command line, and the PRU instruction set, as well as information about register usage and memory mapping.
  • The am335x_pru_package project on GitHub -- contains necessary library code and headers, helpful examples and additional documentation.
  • The AM335x PRU Linux Application Loader User Guide (632KB PDF) (from the am335x_pru_package) -- documents the libprussdrv library API.
  • The AM335x Technical Reference Guide (20MB PDF) -- the full processor documentation, from the TI product page; what you need is in there... somewhere.

Updated 17Feb2014 by DGH: Added link to example archive including Makefile. Thanks to John Cutler for telling me about the omission.

Updated 01Jun2014 by DGH: Added mention of and link to TI PRU C compiler. Thanks again to John Cutler.

Updated 07Jun2014 by DGH: Added link to EmbeddedRelated article with TI compiler tutorial. We have a John Cutler trifecta!