Thursday, December 5, 2013

OpenMV Update: 25FPS Face Detection, USB Support and More

So I've been working on OpenMV for the past week and this is what I have so far:

USB Support:
The camera now supports USB OTG full speed, I've also written a small userspace tool with libusb/SDL to interface with the camera and view the frame buffer, this makes it really easy to debug the image processing code, and it also lets you change the sensor's registers while watching the results in realtime.


I've mentioned building the STM32F4xx libraries in a previous post, you can checkout the repo linked there if you want to build the libraries.

Face Detection:
Many were very interested in this feature, well I've managed to get the viola-jones face detector working on the camera, and it's working fine.. For those of you familiar with the detector, the haar cascade is exported as a C header which is linked to the binary and loaded into the CCM (Core Coupled Memory) a 64KB memory block connected directly to the core. Only one integral image is pre-computed and allocated on the heap, the other one, the squared integral image, which is used for computing the standard deviation, can't fit into memory for the QQVGA resoultion, and so, instead, the standard deviation is computed on the fly for every detection scale using some SIMD instructions to speed it up a bit.


The memory can hold up to 23 stages, however, using only 12 stages and with a relatively large scale step, the detector is working great, with occasional false detections of course, more stages can be used if greater accuracy is required, but not without some performance penalty...As for the numbers, the camera can process 7-8FPS QQVGA, and for QQCIF (88x72) I get 25FPS

Here's a video of the face detector in action running at 25FPS:


Here's another video of color tracking running at 30FPS:


Other Updates:
I've been doing some general fixes here and there, mainly to improve the image quality, in addition to that, I've compiled all the libraries and code with optimizations (-O2) and I've seen great improvements in speed, there's also a new pixel format, grayscale, which is basically just the Y channel extracted from the YUV422 to avoid doing that every time a grayscale image is required.

The QCIF/QQCIF are working now (the sensor can output 60FPS when using QQCIF ) and through some other register probing, I've removed a few useless registers and discovered that the sensor has digital zoom, cool!

There's also simple motion detection code in progress, it's based on frame differencing and using the first frame as the background, more work will be done here as soon as I get around to it. And I will probably try template matching next.


I've also just finished a new hardware revision, it has a tiny uSD socket, which I imagine can be used for anything from storing haar cascades, snapshots or video to buffering larger frames, the new revision is also a bit smaller. 
Read more ...

Monday, December 2, 2013

Using The CCM Memory on the STM32

The STM32 series have non-contiguous memories divided into blocks, for example the STM32F4, has 2 (contiguous) blocks of SRAM connected to the bus matrix with different interconnects, and a Core Coupled Memory (CCM) block which is connected directly to the core.


This tight coupling of the CCM memory to the core, leads to zero wait states, in other words, the core has exclusive access to this memory block, so for example, while other bus masters are using the main SRAM the core can access the CCM. Therefore, the CCM block is commonly used for the stack and other critical OS data, this partitioning, allows the core to continue executing code while for example, a DMA transfer takes place. However, the CCM could also be used as an extra memory block, doing so is easy, and there are a few examples out there that show how, simply defining a section in the linker script will do:
.ccm : {
  . = ALIGN(4);
  _sccm = .;
  *(.ccm)
  . = ALIGN(4);      
  _eccm = .;
}>CCM

And a section attribute is used to allocate memory into that section :
const int8_t my_array[13] __attribute__ ((section (".ccm")))= {....};

However, what if you want to load initialized data into that section ? some look-up tables for example?  using that section is not enough, see, the linker script makes the distinction between the Load Memory Address  (LMA) where data is stored initially, and the Virtual Memory Address (VMA) where the data should be loaded at runtime, if the LMA is not specified explicitly, it becomes the same as VMA.

You can see here that GDB loads the .ccm data into the CCM block (LMA=VMA=0x10000000) directly, while all other sections are loaded into the flash region (0x8xxxxxx):

Loading section .ccm, size 0x4ebc lma 0x10000000
Loading section .isr_vector, size 0x188 lma 0x8000000
Loading section .text, size 0x9744 lma 0x8000188
Loading section .ARM, size 0x8 lma 0x80098cc
Loading section .init_array, size 0x8 lma 0x80098d4
Loading section .fini_array, size 0x4 lma 0x80098dc
Loading section .data, size 0xa30 lma 0x80098e0
Loading section .jcr, size 0x4 lma 0x800a310

While this may sound right, it's not, if GDB loads the .ccm section is loaded into SRAM directly, it will disappear after a power cycle! So instead, we want the LMA to be somewhere in the FLASH region (0x8xxxxxxx) and the VMA to be (0x10000000):
_eidata = (_sidata + SIZEOF(.data) + SIZEOF(.jcr));
.ccm : AT ( _sidata + SIZEOF(.data) + SIZEOF(.jcr))
{
  . = ALIGN(4);
  _sccm = .;
  *(.ccm)
  . = ALIGN(4);      
  _eccm = .;
}>CCM

Note the .jcr is included in by some startup code for something related to Java, without adding the SIZEOF(.jcr) the .ccm will overlap that section, also note the _eidata symbol which will be referenced later in code. Now, when you try to load the elf, GDB prints:

Loading section .isr_vector, size 0x188 lma 0x8000000
Loading section .text, size 0x9794 lma 0x8000188
Loading section .ARM, size 0x8 lma 0x800991c
Loading section .init_array, size 0x8 lma 0x8009924
Loading section .fini_array, size 0x4 lma 0x800992c
Loading section .data, size 0xa30 lma 0x8009930
Loading section .jcr, size 0x4 lma 0x800a360
Loading section .ccm, size 0x4ebc lma 0x800a364

Great, now the .ccm data is loaded into the FLASH region, we just need something to load it from FLASH to CCM in runtime, if you look at the startup code, there's an assembly function that copies initialized data from the flash to where it should be loaded in SRAM (the VMA), you need to do the same for the .ccm data, by either modifying the startup code, or perferrably, copying the data with a C function, so here it is:
void load_ccm_section () __attribute__ ((section (".init")));
void load_ccm_section (){
    extern char _eidata, _sccm, _eccm;

    char *src = &_eidata;
    char *dst = &_sccm;
    while (dst < &_eccm) {
        *dst++ = *src++;
    }
}
Note that the function is placed into the .init section so it executes before main. Now in runtime, this function will load the data from FLASH into SRAM using the pointer defined in the linker script.
Read more ...

Thursday, November 28, 2013

STM32F4xx Libraries



I wrote a tutorial before on how to setup a toolchain and build the STM32F4xx standard peripheral drivers into one convenient library, since then, a few people have asked me about the library, so to make life easier, I downloaded the latest StdPeriph/CMSIS, in addition to a few other libraries that I might need later, and shared everything in one repository, which currently has the following libraries:
Cortex-M  CMSIS      V3.20
STM32F4xx CMSIS      V1.3.0
STM32F4xx StdPeriph  V1.3.0
STM32_USB_Device     V1.1.0
STM32_USB_OTG        V2.1.0
In addition to those, the repository also includes a simple USB device library  (stm32f4xx/USB_Generic) which abstracts all the horrible details of the USB libraries into a very simple generic USB device implementation with just two Bulk endpoints...

To use this library you just pass a struct with two callback functions and the library will call those functions whenever data is received or requested, it's as simple as that, note that it's configured for OTG FS only, it could still be useful if you just want to get USB working and don't have time to go through all the examples.

Finally, repository also includes some examples, a Blinky, a USB_Generic example and some user-space code with libusb.

Building The Libraries:
To build the libraries and examples just type make in the top directory, the top Makefile will pass along all the flags and variables, here are some options you can pass on the command line:

make DEBUG=0
This will build everything with -O2 and no debugging symbols (not recommended)

make DEBUG=1 CFLAGS="-DOSC=xx"
This will build the library with debugging enabled, no optimization and using the given crystal frequency in MHz (for example -DOSC=16)

Repository:
https://github.com/iabdalkader/stm32f4xx.git
Read more ...

Sunday, November 17, 2013

FT231X Breakout

This is a breakout board for FTDI's latest USB-to-Serial bridge, the FT231X. This one comes in a smaller package (SSOP20) and it's cheaper than its predecessor the FT232R/L, it also offers a charging detection feature, which I'm not really interested in, but anyway see this post for more details.


I've seen a couple of good breakout boards out there, but this one has a few advantages over the others,  it has a solder jumper to switch VCCIO between 3.3V and 5.0V, a 500mA PTC fuse for over-current protection and it's pin compatible with the Arduino mini programming header.

Repo:
https://github.com/iabdalkader/ft231x-breakout.git

Read more ...

Sunday, November 10, 2013

Color Tracking With OpenMV

So I finally had some free time to work on OpenMV, for those of you who haven't been following this project, OpenMV is an open-source tiny machine vision module based on an STM32F4 ARM Cortex-M4 micro and an OV9650 sensor. I started this project mainly because I find the existing cameras are either too limited for their price or too expensive if they do some image processing, so one of my main goals was to make this as cheap as possible, the total cost of this module so far is around $20 for a single board, and could go as low as $15 for 500 pieces.

I wrote a simple color tracking algorithm for the camera, which I tested using an Arduino, the Arduino sends out a few commands via the serial port to the camera telling it to capture and process a frame, it then receives back the coordinates of the object and controls the servos accordingly. I was able to process around 15FPS, which is not bad given the current naive implementation. This is a video of the camera tracking a ball:

You can find the code and schematics here in a single repository, keep in mind that I'm still working on it, and I'm also considering a new revision to fix some minor issues and add an SPI flash for storage. If you have any suggestions/feedback please feel free to leave a comment. 
hg clone https://code.google.com/p/openmv-camera/
Update: I've created a new repo on github, this has the most recent version:
git clone https://github.com/iabdalkader/openmv.git
Read more ...

Thursday, October 17, 2013

OpenMV Update

Another quick update on my project OpenMV. I've finally managed to get the sensor working in QQVGA/RGB565 mode:


I'm using the discovery board as a debugger and a smart LCD to display the frames via USART. I've also implemented a simple color tracking algorithm, as a proof of concept:


I'm currently cleaning up the code, and I will share it with the schematics next time.. Thanks for all your support :)

Read more ...

Monday, September 16, 2013

OpenMV Camera Module

This a quick update on my camera project (OpenMV)... I've finally received the long overdue PCBs today and assembled one:


I haven't written any code yet, but I've managed to get it into DFU mode and upload a blinky, it seems to be working fine so far, no smoke :) I will post another update or a video as soon as I get something cool running.


Read more ...

Friday, July 19, 2013

NHD-C12832 Breakout

This is a breakout board for NewHaven's NHD-C12832 graphic display, a 128 x 32 pixels display with an SPI interface. I like this display mainly because it's cheap (about $11) I think it was the cheapest one I could find with this resolution, and it also looks great. However, it has an unusual package (tight pitch pins and four holes, two for the backlight and two for the screen) so I had to spend some time working on the footprint and PCB, but the end result was good.

I placed all components on the backside of the PCB to keep it as small as possible, the board has a 3.3v 150mA LDO regulator (the display operates from 2.6v to 3.3v) and a level shifter (74VHC541) to convert the logic signals to the operating voltage, so it's compatible with 5.0v logic and can be powered from 3.3v up to the maximum rating of the voltage regulator. The backlight is connected to the regulator via a MOSFET (it draws 45mA maximum) which can be controlled with PWM. The display has an SPI interface and draws about 0.45mA maximum.
The display controller is supported by u8glib, so I wrote small Arduino sketch to test drawing a bitmap,  to generate the bitmap, I used gimp to convert the image to black and white (1-bit) and then exported it to hex (save as .xbm) and then included that as a header in the sketch...



Sources:
The repository includes the Eagle files and the Arduino sketch.

hg clone https://code.google.com/p/nhd-c12832-breakout
Read more ...

Tuesday, July 9, 2013

In Search of a Better Serial Camera Module

So I've been looking everywhere for a cheap serial camera to use in my projects, preferably with some basic image processing, like object or motion detection etc.. The cheapest one I could find that barely meets my needs is sold on adafruit for $35, it has a 640x480 pixels sensor and can do some motion detection but nothing more. Then there is the more expensive modules which cost around $40 that do nothing at all and finally there's the CMUcam, which has some really nice features, but way too big and expensive (costs $100) for my needs...

So I decided to make my own serial camera, keeping in mind the basic set of features that I want:
  • Low cost
  • Small form
  • Basic image processing
  • Open source (duh!)
First, I had to choose an image sensor to use, I was inclined to use the TCM8230MD, which I'm familiar with, however, the sensor alone costs $10 and I can't seem to find it any where else other than sparkfun. So, I've decided to try the Omnivision sensors, the cheapest one I could find is the OV9650 (1280x1024 pixels) sold on ebay for $2... The nice thing about this sensors is that it connects to the board with an FPC cable (that flexible yellow cable), which means it's possible to replace it later with another one (assuming it has the same pinout) and it also has a higher resolution than the TCM8230MD. The downside with this particular sensor is that it doesn't have JPEG compression, but I could live with RGB/RAW output, after all I plan to use it mainly for image processing, or I could try to implement the JPEG compression on the micro.

Moving on to the microcontroller, a powerful micro is needed to interface with this sensor, preferably with a DCMI hardware interface, I implemented a DCMI before in software with an LPC1768/TCM8230MD and it could barely keep up, so I decided to go with something faster. 

Fortunately, I had a couple of STM32F4 micros laying around, which seemed perfect for the job since it  runs at 168MHz, has a hardware DCMI (should make it a lot easier to interface the camera) and as an added bonus, it has a floating point unit and vector processing (SIMD), making it perfect for image processing... Unfortunately, the DCMI only comes with the LQFP-100 package, so I couldn't use a smaller one. 

I started working on the PCB, first, I made a footprint for the sensor and its connector, which seems to fit nicely:
And then I moved to the layout, however, shortly after that, I realized that it's impossible to fit the sensor and micro (let alone the debugging header and interface) on a 2 layer PCB, which is what I had in mind initially to reduce the costs. So a 4 layer PCB seemed inevitable If I wanted to make it as small as possible, there's also some inherit benefits to the extra layers such as better power planes and decoupling, easier routing etc... Anyway, the first version of the PCB (25x37mm) costs around $5


The board has a triple output LDO 3.3v/2.5v/1.5v (the one I had at hand came in a QFN package) and a micro USB connector, which can be used to power the board or update the firmware with DFU. The  SWD pins are broken out for debugging (I'm currently working on a JTAG debugger too) I also throw in an RGB led (not sure if it's too close to the sensor to be visible or not)

The STM32F4 micro costs $11.79 per one, and $7.31 per 500, this adds up to around $20 for one piece and could go as low as $15 for 500 pieces,  it's much higher than I expected, but It should make up for it with some cool image processing, and still almost $15 cheaper than the adafruit camera :)

I still haven't received the PCBs yet, will post an update when I do and when I get it working I will release all the sources. I'm also giving away one for free, leave a comment if you're interested :)
Read more ...

Sunday, May 5, 2013

Running OLinuXino-MAXI on Battery


Finally I've had some time to play with my new Olimex iMX233-OLinuXino-MAXI board, the board is similar to the Raspberry pi, almost the same size and cost, however, this one has an ARM iMX233 running at 454Mhz, less RAM and no GPU.

I played with the board for a while then I decided to test it using batteries, I found out that the USB/Ethernet physical is disabled when running on batteries ! After some digging around in the datasheet, there seems to be a way around this, by swapping two jumpers (3.3V_E and 3.3VIO_E) located near the reset button that control which DC-DC converter is used, the two possible configurations for those jumpers are:
  1.  External DC-DC: 3.3V_E closed (soldered), 3.3VIO_E open (unsoldered/cut)
  2.  On-chip DC-DC: 3.3V_E open (unsoldered/cut), 3.3VIO_E closed (soldered)

Note that the datasheet mentions that the maximum current that can safely be drawn from the on-chip convertor is 200ma and you need to make sure that you don't draw more than that, I'm not sure how exactly, but you should probably use one USB device at a time and either the USB or LAN.

It also mentions that the internal DC-DC makes the chip heat a little bit, so you may want to place a small heat sink on the processor.
Read more ...