Before you start, why not email and request your very own developer ID, and one or more layer role code/keycode pairs so that your plugin(s) can be distributed? It's free, easy, and fast.
Note that if you do not get your free ID, keycode(s) and role(s), your plugins are subject to restrictions:
This is not being done to inconvenience you; it's being done to ensure that the plugin system remains reliable and valuable for everyone, including you. What this accomplishes is that all plugins in the field are guaranteed to have unique identification characteristics; this in turn means that the application can always guarantee to identify plugins and their settings correctly, and that black hat plugins can be disabled if need be.
The gory details are in the developer.h file, including the email address to use, suggested email content, and so on.
Within dTank (β), images are maintained as three 16-bit channels of red, green and blue information. Within a channel, the first (0th) 16-bit word is located at the top left of the image; the last, at the bottom right. Channel data is arranged column by row; that is, all of the information for the first (0th) row comes first, with the leftmost pixel coming first and the rightmost pixel coming last, followed by the next row, and so on.
In some cases, dTank (β) also uses "alpha" channels; these are 16-bit per pixel channels arranged just like the image channels that are used in various ways.
Images are "resolution independent", which means that from the point of view of your plugin, you can never anticipate what resolution image (in terms of pixels) you will be asked to process, even for the same image. This has critical implications for anything you do that is geometrically sensitive: for instance, a 10-pixel brush does not have a similar effect on a 1000x1000 image as compared to a 100x100 image. For this reason, anything that has a "size", conceptually speaking, should be specified as a floating point (or double) entity that is relative to the x-width of the image.
For instance, instead of dealing with a 10-pixel brush, you deal with a brush that is 10% of the image width. That way, when you process the 100x100 image, the brush will cover 10% of it, or 10x10 pixels; and when you process a 10000x10000 image, the brush will still cover 10% of it, because you'll set it to 100x100 pixels, or 10% of the new resolution. Never assume you know what resolution the user will cause the program to hand your plug-in. You can't know; so don't try. Use a resolution-independent approach at all times.
Plugins have the following basic operational areas;
You can, at any time during the execution of your plugin, call services we extend to the plugin environment. These include memory allocations and deallocations, retrieval of eyedropper and marker information, and a few other handy things. See the descriptions in developer.h that accompany the #defines that begin with REQ_
The developer interface is both extremely simple, and very powerful. In terms of integration, your plugins are first-class citizens. No intermediate files are required, and your plugin interface is added right into the system with the stock interface panels. Your processing runs right along with the system processing. Your panels can be re-arranged along with the stock panels and other plugin panels by the end user.
dTank (β) can multithread your plugin — or portions of it — all you have to do is ask. It can be called in a flexible manner that allows you to build the simplest possible processing code, while dTank (β) manages the complexity — and there's a lot of that — behind the scenes.
You don't need Cocoa; you can build a 100% functional plugin that presents a well organized OS X user interface without knowing anything about Cocoa. The demos, in fact, are built with GCC instead of XCode in order to demonstrate just how Cocoa-free the mechanism is. You can literally build a complete plugin in minutes without any Mac-specific knowledge.
dTank (β) gives you the ability to create buttons, checkboxes, sliders, and color wells on a panel dedicated to your plugin that integrates completely with the effects native to the software. These panels are created for you using the native rendering of the resident operating system, so they have the precise look and feel that the user expects. All this with no work on your part!
On the other hand, if you want to make a beautiful interface of your own with all manner of Cocoa amenities that pops open in its own window, even maintains its own data elsewhere in the filesystem, yes, you can do that, too. See akplugobjc.c and build_akplugobjc for an example of programatic use of Cocoa to build an interface window (extremely simple, and still just gcc, but remember, this is just a demo.)
Of course you can also build a plugin in any language that can produce a proper .so object, as long as you can move the information from the developer.h file into your chosen language successfully.
The only "Official" thing you really need to do in order to develop plugins is contact us as explained in the developer.h file, and that is request your free developer ID and layer role code(s) so that your plugins can be distributed (any way you see fit) to other users of the software.
The following is a demo plugin that applies linear brightness using a single 64k remap layer.
You can build plugin_demo.c by executing the build_plugin script we supply.
As simple as it is, it offers the following features:
Basically, here's how it works. Each of the R, G and B channels of the image contain pixel values from 0...65535 (where 0 corresponds to off, and 65535 corresponds to maximum brightness in any channel.) R, G and B stand for Red, Green and Blue respectively.
The goal of the plugin is to implement brightness; and even for those new to image processing, it may be fairly obvious that to make the image brighter, we simply need to increase each pixel's channel values.
So the approach in this plugin is to precalculate some information:
This is what the pengine() call does. That's all it does. So if the user moves the brightness slider, pengine() is called, the table is set up (possibly by multiple threads) to match the new brightness setting, and done.
Here's a breakdown of exactly what pengine() is doing.
First we calculate b, which is an invariant value from the point of view of the pengine() processing loop. p->f[0] comes from the user's slider setting, and may vary from zero to 200 (because we said so in setup()):
b = (p->f[0] - 100.0f) * 655.35f; // b is now [-65535...65535]
This gives us a constant amount to add to every brightness value that may be -65535 to 65535, thereby able to adjust those values as far as all the way from one end of the brightness range (0) to the other (65535).
In the loop, for every possible value of brightness from 0 to 65535, the adjustment in b is added to it, and then, limited to prevent it from going brighter than the maximum brightness that can be represented in 16 bits, or darker than zero. The first line calculates the new brightness and puts it in v:
v = i + b;
Next, we make sure blacker than black is just black:
if (v < 0) v = 0;
Then we make sure whiter than white is just white:
if (v > 65535) v = 65535;
...and finally, we insert the bounds-checked brightness change into the appropriate place in the lr layer, where we are building a 64K table of all possible bounds-checked brightness changes:
p->lr[i] = v;
Because we've allowed that the plugin's pengine() call may be threaded, you don't know when you design the plugin what the ranges passed in as the values of start and finish will be (it depends on how many cores the user decides to allow the software to use) or how many times pengine() will be called during the setup process.
But what you do know is that by the time the process is over, some number of threads (from 1 to however many cores the user makes available) will have called pengine() with every possible brightness value in the range you specified, which was 0...65536.
Since you know that, you need not be concerned with exactly what range or what order things happen in at execution time; just be aware that they all eventually get done, possibly by quite a few different threads.
Now, when the time comes to draw the image, because of the pre-calculation done in pengine(), there are no calculations or tests remaining to be done.
They've all already been computed, and the results placed into that table in the 64K lr layer. So all dengine() has to do to apply that particular brightness setting is look up each pixel's R, G and B brightness values, which are in mr, mg and mb channels, using that 64K table in lr, and then place the looked-up value in the output buffer, which is made of the sr, sg and sb channels.
So the start...finish loop causes dengine() to process some large set of the image's pixels in some order. Again, we don't know which threads finish first, or what the values of start and finish will be that they process — nor do we care. Inside the loop, each pixel gets the same treatment, and all of them will be processed by one core or another by the time the threaded operation is complete.
Here's a breakdown of the green working line in the dengine() process:
+----------+-----+--- plug struct pointer: contains image's channels, maps, and data | | | v v v p->sg[i] = p->lr[p->mg[i]]; ^ ^ ^ ^ | | | | | | | +--- this selects which pixel in the green channel | | | | | +--- mg[i] is original green brightness... | | | +--- lr[mg[i]] remaps the brightness... | +--- ...the result goes into the green output buffer
All three lines are similar; they just individually remap the red, green and blue channels. That's all there is to that.
The threading issues are very similar. Threading was requested, so you don't know what the actual values of start and finish will be for any one call to dengine(); but you DO know that every pixel in the image will be processed, and that's all you really need to know in this case.
You can (and should) learn more about the plugin system by reading the developer.h include file. Lots of useful comments in there!
plugin_demo.c is compiled into a .so run-time library. When dTank (β) starts, it searches its directory (folder) for .so files named in this fashion...
ak_X.so
...where X is a unique name for each plugin like foobar, twinkle, or johndoe.
If the user decides to add your panel to the working set of panels, it will appear in dTank (β) as defined by your setup() call, with various numbers of sliders, checkmarks and buttons. If the user sets a parameter, your init, and then your preprocessor (pengine()) will be called, and then when the image is ready to be processed, your dengine() will be called. You have to define a pengine procedure; but it doesn't have to do anything if your plugin does all its work in dengine().
However... the whole point of providing a layered pre-process is that some work can be done only when settings change, and therefore not cost the user time with every redraw. Please keep that in mind. Users won't look kindly upon plugins that slow down the overall drawing speed of the system.
One thing that can get in your way is that between pengine() and dengine(), and between phases within the scope of either engine's activity, the only obviously persistent data is that in the layers and the settings the user has made.
There are actually two ways you can pass information between phases, and even between pengine() and dengine().
The first is to define a structure that can hold all the data you will need, such as this imaginary situation...
struct mydata { long value_one; unsigned char string[80]; unsigned short r_palette[256]; unsigned short g_palette[256]; unsigned short b_palette[256]; float limits[16]; };
...from here, still in init(), allocate the structure using the REQ_MEMORY service...
struct req re; struct pkg *pa; re.svc = REQ_MEMORY; // we want memory re.v1 = sizeof(struct mydata); // this amount pk = service(&re); // ask... pointer will be in pa.p1 re.svc = REQ_HOLDMYBEER; // we want the pointer to be held for us re.p1 = pk->p1; // get pointer (it could be NULL, by the way) pk = service(&re); // have the application maintain the pointer
...now, the application is holding on to that pointer for you. Not only that, but because you obtained the memory with REQ_MEMORY, when your plugin exits, it will automatically be cleaned up for you, so you don't have to explicitly free it.
So, when the time(s) come(s) in pengine() or dengine() that you need to read or write that data, you simply do this...
struct mydata *md; struct req re; struct pkg *pk; re.svc = REQ_THRISTYNOW; // we want our pointer back pk = service(&re); // make the request md = pk->p1; // retrieve the pointer if (md) // if the pointer is not NULL { // do cool stuff with the data in md->... } else // (memory was not allocated) { // avoid doing anything that would require that data }
...now, if not NULL, md is a pointer to your mydata struct, and you can proceed. Do keep in mind, though, that although we check that pointer for success here (compare it against NULL or check for a SVC_OK response) that's not enough. That's because you also need to know in your later *engine() invocations if you have it, or not — so that's another place you need to (re)test it to see if it is good, just as shown here.
As mentioned below in "Threading gotchas", you have also to be careful about writing the same data from multiple threads. The question of who got to write the most recent information becomes unanswerable. So when setting up (writing to) a structure like this, we suggest that you use a non-threaded phase. Try to minimize the work done in such phases, as (obviously) they can be much slower on multicore machines.
The second method is similar, but instead of having the application handle the memory via the beer mechanism, you stuff it in a layer.
There's a significant benefit here: information you calculated in pengine() will be available to dengine() not only when control settings are changed by the user and between phases, but also when only dengine() runs (because the image is being redrawn, but the user hasn't changed the settings on your plugin.)
Again, you start by defining the data you need to hold in convenient structure form...
struct mydata { long values[20]; float limits[32]; unsigned short r_palette[16384]; unsigned short g_palette[16384]; unsigned short b_palette[16384]; unsigned char string[40]; };
Now, in setup, you add this size to the size of one of the layers you are allocating...
req_lr = my_rounded_size + sizeof(struct mydata);
...then you can get to the structure this way...
md = (struct mydata *)&(p->lr[my_rounded_size]);
...or allocate a layer that just contains this, as we do here...
req_lr = sizeof(struct mydata);
...then you can get to the structure directly...
md = (struct mydata *)&(p->lr[0]);
...now, we guarantee that layers will be long-aligned, that is, the memory will start on a four-byte boundary compatible with fast 32-bit execution. But if you were to allocate a 16-bit channel that had an odd length, then the structure, if stored at the end, would also begin on an odd 16-bit boundary. The fix is to round up the size of your channel request to an even number. That's what my_rounded_size is. Say it turns out you needed 101 shorts. You need to round that up...
my_rounded_size = 101; // initial calculation or assignment if (my_rounded_size & 1) // if (size is odd) { my_rounded_size += 1; // ok, now it isn't odd } // and now it's safe to tack a structure on: p->req_lr = my_rounded_size + sizeof(struct mydata);
...which you would obtain access to later, this way...
struct mydata *md; md = (struct mydata *)&(p->lr[my_rounded_size]);
...and now the data in the md structure will not only persist between phases and engine calls, but also across the life that the plugin is applied to the image (unless, of course, you change it.) Keep that significant difference in mind when choosing between method two, here, or method one, above.
We've made threading so easy here that odds are, you're going to try to thread everything you do. But there are pitfalls lurking in certain types of processes; in particular, any process that looks at more than one pixel at a time in the output channels is prone to failure. This is because you simply can't know if the pixels you're looking at have already been processed or not. Maybe some thread has already been all through there; maybe this thread is the first thread and no pixel has been touched. You just can't know. So you can't depend upon previous output values when you're threading.
When you don't request threading, and your plugin works; but if you thread it, it doesn't... that's a red flag telling you that you're trying to write the same data with more than one thread, or reading data before it has bene written. Either don't do that, or split the job into multiple phases (see below), or (last resort only), don't ask for threading. That's not such a hot idea because it can slow your plugin down. A lot!
You can tell the plugin system that you want multiple phases. In this way, phase 0 can, even multithreaded, set up all the pixels into the state that you need them in; the second phase is not started until all the threads have (or the thread has) completed the first phase, so now there is a guarantee that all pixels have been processed by your first phase activity. Which phases are multithreaded is set by the dphase[32] and pphase[32] arrays for up to 32 phases. This can help you avoid reading data with one thread while writing the same data with another.
Q: How do I create a whole-image replacement, for instance if I painstakingly noise-reduce the whole thing with my ultra-sophisticated fractal-hypno-fourier-wavelet pranking-perceptual-pixel-busting algorithm, but I don't want to re-process the user's image every time it's drawn?
A: Easily enough. Set req_lr, req_lg, and req_lb all to -1. This will give your effect RGB layers equal to the image size. Stick anything you want in those layers in pengine(), then in dengine(), simply copy them to sr, sg and sb. pengine() is only called if the user changes settings or when the image is first loaded with your effects active. It's a good idea to allow for an alpha channel here, so the user can paint the new image data in and out: just set promotable=1. Or you can use a private alpha channel to pop an image over the user's image in a controlled manner, as with a logo, picture frame, sigil, or watermark.
Q: Speed?
A: Yes, well. Speed. Very important. Most important in dengine(), because every time the image is redrawn, it will go through your dengine() if your effect is enabled. It's nice if pengine() is fast as well, so that it is more responsive to settings changes, but really the idea here is to put the time consuming calculations into pengine() — somehow — and to make dengine() as fast as humanly possible. For instance, it's not even considered unreasonable to distrust the compiler and make all pointer references direct. In other words, instead of this beautifully simple approach in our brightness example...
long i; for (i = p->start; i < p->finish; i++) // 1-dimensional pixel processing { p->sr[i] = p->lr[p->mr[i]]; // remap pixel brightness p->sg[i] = p->lr[p->mg[i]]; // ...using 64k table in layer p->sb[i] = p->lr[p->mb[i]]; // ... }
...*this* could be faster with some (stupid) compilers:
long i; unsigned short *sr,*sg,*sb; unsigned short *mr,*mg,*mb; unsigned short *lr; mr = p->mr; mg = p->mg; mb = p->mb; sr = p->sr; sg = p->sg; sb = p->sb; lr = p->lr; for (i = start; i < finish; i++) // 1-dimensional pixel processing { sr[i] = lr[mr[i]]; // remap pixel brightness sg[i] = lr[mg[i]]; // ...using 64k table in layer sb[i] = lr[mb[i]]; // ... }
...and, if the authors of the compiler were drunk and stoned when they wrote the compiler, also perhaps in a hurry and going out for pizza instead of coding to the metal like they should be, it is possible that this might be faster yet:
long i; unsigned short *sr,*sg,*sb; unsigned short *mr,*mg,*mb; unsigned short *lr; mr = p->mr + start; mg = p->mg + start; mb = p->mb + start; sr = p->sr + start; sg = p->sg + start; sb = p->sb + start; lr = p->lr; for (i = start; i < finish; i++) // 1-dimensional pixel processing { *sr++ = lr[*mr++]; // remap pixel brightness *sg++ = lr[*mg++]; // ...using 64k table in layer *sb++ = lr[*mb++]; // ... }
...Well. That last might have been a trifle unfair [shakes head violently, winks]. Some CPUs have cost-free increment and store as well as increment and load instructions, and the above is basically the "c way" of saying, "use those, compiler." But modern pipelines, large register sets and smarter compilers should — hopefully — obviate these kinds of hand optimizations. And CPUs tend to have really fast register addressing using another register as an offset, too, so the form x[i] should be just as fast. Again, unless the compiler writers were having a really off day.
But... look. When in any doubt at all, benchmark it. Set the plugin (or the app) for single threading (threading, while generally faster overall, introduces all manner of timing uncertainties), and time your effect repeatedly on a large image, using each of the various approaches. Switch it on and off with the master checkmark for your panel. If one of these approaches is markedly faster, then by all means, use that one.
Please consider supporting my dTank (β) development efforts via a small PayPal donation. |