Software Development

Seeking a programming middle ground

Justin James feels as though the programmers who do the more logic-intensive work that he's attracted to are just not interested in or able to turn it into a useful, usable application. So where does he go from here?

The coincidental events in my life keep pushing me to think that the programming paradigm needs an update. Over the course of the last week, I stumbled upon confirmation that my suspicions regarding hardware are correct.

It crossed my mind that current hardware models are lousy at expressing parallel code. It turns out that programmers have been using Graphic Processing Units (GPUs) - which are typically found on video cards -- to write parallel code for nongraphics purposes for quite some time. Also, many of the instruction sets for CPUs that have been aimed at gaming, such as MMX, SSE, and 3DNow!, are also designed for parallelism.

(I am sure that at least a few of you are chuckling because you've been wondering how long it would take me to notice this. Thanks for not cluing me in because I really enjoyed this learning process.)

Unfortunately, GPUs are optimized for working with graphics, which means performing mathematical calculations and not much else. The GPU is designed to perform the exact same calculation on a lot of numbers in tandem but not cooperatively. In other words, the operations do not work with each other at all. Another problem with this approach is that the languages used are much lower in level than what most programmers work in. Most programmers work in Java and .NET for pretty good reasons, and (as far as I know) neither of these systems allow working directly at the hardware level (at least not without some fancy footwork). These systems are their own VMs, which means that they abstract the idea of the underlying hardware out completely.

Where are we going with this? Well, I like the idea. It points to the idea that arrays of processing pipelines can make parallel computing relatively easy.

Let's pretend that we have a language that emulates such an environment on a standard, general purpose CPU like the x64 architecture. This probably wouldn't get us much. Outside of multimedia content creation (the days of upgrading your PC to have enough horsepower to decode an MPEG are long past), the vast majority of applications simply are not CPU bound; they are often not I/O bound either. After all, they are not reading and writing huge amounts of data at once; at most, they stream data from a source at relatively low speeds. I will go as far to say that the only real performance bottleneck for the majority of users out there is physical RAM -- hitting the swap file is the big slowdown.

Let's imagine another example where the command line is still king. Outside of that, everything else is the same as our current environment. We use Lynx or something similar for Web access, vi, Emacs, or WordPerfect 5.1 to handle text input, and Pine (with all of Outlook's functionality) is the e-mail client of choice. These apps would barely touch the CPU because the fancy graphics is what is putting the hurt on our PCs -- not what the applications themselves are doing (except in very rare cases).

Even on the server side, the story is much the same. The CPU gets creamed, mainly because the OS has one or two physical cores trying to timeslice to do one thread per request. It is the cost of context switching that hurts; this is the server scalability problem. No individual request really takes more than a small percentage of CPU time, but if you have more than a few requests, and the overhead of simultaneous processing artificially inflates the CPU needs.

I feel very alone right now in the programming world. I feel like the programmers who do the more logic-intensive stuff that I am attracted to are just not interested in or able to turn it into a useful, usable application. The programmers trying to build applications have this bland "me too!" style processing under the hood. The technologies that work great for logic do not seem like they are capable of being used in a system that real users can use, and the systems that work great for real users cannot seem to handle tricky logic. Where is the middle ground? I am trying to get a little bit closer to the envelope, and I am really stuck between the hammer and the anvil.

So where do I go from here?

J.Ja

About

Justin James is the Lead Architect for Conigent.

21 comments
gpopkey
gpopkey

Hi Justin, You have described a problem that keeps re-occurring. Users define a need, programmers attempt to meet the need but hardware structures prevent some of that. Programmers (like yourself) need to define generalized controls needed to be implemented in hardware and software that will permit good, fast parallel processing so that engineers can design the computers that appropriately implement those controls (which have to be simply implemented) before we'll see significant gains through parallelism, virtualization, concurrent processing. Engineers working on massively parallel processors and computers are trying to develop the structures that allow programmer control over how the thing actually works. Then programmers have to develop language(s) allowing the capabilities to be fully utilized and then work those language features back into the easier-to-use/maintain languages and programs needed by our users. Maybe we have to get away from von Neuman processors and into processors more closely resembling brain neurons. The cross-communications between processors that are felt necessary are almost beyond comprehension. We aren't there yet as far as I know. The fact that we can take advantage of the power of graphical processors for some tasks is in itself remarkable. However the use of such facilities are not standardized and are not expected to be standarized so are unlikely to be made easier to use in common computer languages because the next hardware version would likely implement completely different instruction sets, invalidating any routines using current hardware facilities.

SoftwareMaven
SoftwareMaven

And by fault-tolerance, I'm not meaning that "an exception was properly handled", but rather that applications can handle some incorrect data processing, yet still come out with a correct answer. Our brains are really good at parallelism because it is also good at managing faults. Incorrect processing is generally quickly identified and resolved. So partial data and shared data can quickly be used to begin processing a solution, and the solution can be adapted quickly as changes come in. Software systems don't have the fault tolerance to deal with partial data, faulty data, and rapidly changing data. I'm with you, Justin. A paradigm change is needed, and it needs to happen in the hardware, the languages, and the developers. Unfortunately, my prediction is that it is still five to eight years out. Travis

rclark
rclark

There is currently a need for parallel processing because the hardware has not evolved far enough. Just as at the beginning, there were problems with disk drives until they put controllers onboard, and serial ports before UART's, DSP chips and controllers, and etc.. Each evolution of the hardware cycle creates a need for a control structure. When there was one CPU there was no need for a controlling CPU. When there were two, there began to be a need, but it could be handled by a simple binary switch which said who was least busy. Now we have quadcores. At this point, we probably have reached the maximum number of cores that can be effectively managed without the evolution of hardware to effectively utilize the number of cores. I imagine there is some internal routing of requests to levelize the load, but to do it system wide, there needs to be a new component. Call it a Central Controll Unit. This unit would shift requests between the available CPU's, and each CPU would in turn shift between it's cores. So at that point, you would write your code as if it would run on a Single Core/Single CPU system. The assembler on the system would assemble the compiler code to take advantage of the multiple CPU/Multiple cores, and the CCU would parse it for parallel processing efficiency, single threading where necessary for concurrency and multithreading where possible to reach gains in processing throughput. Maybe a bit simplistic, but it seems that every time we run upon a bottle neck, the answer is to double the working components and add a supervising controler.

kpthottam
kpthottam

let me being with an example that requires both ends of the spectrum. CRM applications have to do two thing really well - 1) work with Terabytes of customer information, work thro and find the targeted segment of customer. 2) Provide a highly business user friendly non-sql / non-programming paradigm user interface that will allow the user to exploit the "logic-intense" work done by group 1. Technology as it stands today forces us down these two very diverse paths, and merely offers integration technologies to allow these two spectrum s to use each other. With my limited imagination it is hard to see that middle ground being achieved. Think of the physical world: you have rockets for speed where wing span is a liability and aero planes where wing span is a necessity. The shuttle could be considered the closest middle ground , but if it where a truly viable (cost) middle ground, all other nations would duplicate it.

Jaqui
Jaqui

apache? it actually handles multiple concurrent requests extremely well. [ better than any other server app outside of database servers, and even there it's on a par ] multiple threads and multiple children processes can actually combine to have a whole heck of a lot of instances of apache working at once. [ I'm sure you know apache is more than just a web server, it is an authentication server, file server, application server as well ] hmm apache and emacs, the most complete operating system available to man in the fewest number of applications. :D define a "useful, usable application." that is such a huge range that without specifics I can't see where your logic may be failing.

Tzekov
Tzekov

Hi Justin, If I understand you correctly, then you want to tell us that the idea of "virtual" memory is wrong? If "yes", then it is a good starting point for new IT revolution. Does anybody has an idea how we can avoid virtualisation?

avacoder
avacoder

Everyone has touched on the core of the evolution of computing...industrialization of lessons learned. Von Neumann modeled linear processing --> programmers internalized the lessons and pushed the limits of the hardware --> hardware vendors industrialized the patterns and eliminated the limits --> programmers pushed the limits again --> etc. No one is re-examining the fundamental assumptions of Von Neumann machines and that's what Justin is talking about here with regards to a 'paradigm shift'. It is possible to break down large problems into simpler pieces that lend themselves to Von Neumann solutions...it's re-integrating the 'little solutions' back into the larger solution that's so hard to do...Justin's point about cross-thread communication. The nuts and bolts of thread synchronization has yet to be industrialized at the hardware level. The question I'm asking, is cross-thread synchronization the problem, or is the threading 'model' the problem?

Tzekov
Tzekov

:-) I still remember the days I was a student (mid 80?s) and our Professor reading us "Architectures of Computing Systems" mentioned about coming Parallel architectures with 64 and more processors (the difference between cores and physical processors is not big at higher abstraction lever), about neuron networks and many other things that will bring us a step closer to the "human brain" processing. I was impressed. And they?re already 20 years still no commercial implementation of all these ideas. :-( May be the problem is in these simple "0" and "1" ("black" and "white") staying behind all our IT systems? Our brains are analogous processors ? not a digital processors. But I?m still looking forward to see something real moving us in that direction. I?m not against the paradigm shift, but I?m not such a big optimist. 5 to 8 years are too less time to make this shift. How much time it was necessary for von Neumann?s ideas to become the reality for the mankind? (20-30?)

Justin James
Justin James

I have thought along those lines as well. I think that the current hardware model common in the x64 market is a huge contributing factor in this. One thing of note is that Intel is working on essentially "roll up" multi-core machines operating much like you describe; taking a standard MC machine, but rolling the cores together to appear as one. J.Ja

SoftwareMaven
SoftwareMaven

I agree this is an important piece, and it will be one of the pieces that continue to expose the need for VMs that can take the running application context and wrap it around the current hardware context. Unfortunately, languages are not expressive enough today to give the proper level of context. Travis

Justin James
Justin James

Jaqui - Apache is a good dispatcher, but that is about it. In fact, all Apache is, is a multiplexer for STDIN, when you really look at the heart of it, combined with a default behavior of serving static file data. That being said, it is excellent at what it does. But it falls apart when you need interprocess communications. One reason why it is as fast, secure, and stable as it is, is precisely because it does not let those processes that it spawns talk to each other. Look at the PHP session model, it is a perfect example of how Apache's model becomes rather inefficient as soon as IPC is required. In fact, PHP's session model is not even IPC, shared memory or anything of the sort... at best, it can be described as a mutex locked memo pad to leave "while you were out" messages to other processes. I know that PHP's session model is not written to be Apache-specific, of course, nor is it really an Apache issue per se, but it is indicative of what happens when Apache child processes need to communicate. Indeed, can Apache's ability to MUX be attributed to Apache, or the base OS? I am not knocking Apache here, just saying that it is not such a hot option for anything other than processes running traditional POSIX-style processing with little to no need to communicate between processes. "useful, usable application"... I may have to be a cheapskate and "define by example": * Photoshop * Excel (tons of hidden functionality in the Perl "DWIM, NWIS" style in Excel, beleive it or not) * Perl (yes, at this point, I consider Perl to be an "application" more than anything else, every since I stopped using it as a general purpose language a while back) * paper and pen, great combination :) * tinyurl.com (shame it is needed in the first place) * Visual Studio (most of the time, sometimes I hate it!) * SQL (the language itself, it more perfectly encapsulates its specialized problem domain than most languages do) J.Ja

Justin James
Justin James

I really do not think that virtual memory is really the problem per se, it is more in the way the hardware handles concurrency. The GPU system is fine for operations that do not need any concurrency or data sharing, but that is such a rare thing in most applications. The CPU model has no way to effectively harness multiple processors without a huge amount of error prone "black magic" where code that is speedy on one system will be slow on another, forcing most programmers to leave all but one core idle. There needs to be a better hardware model, with the accompanying programming techniques! J.Ja

Justin James
Justin James

I think on different hardware, threading would probably be quite different. Unfortunately, threading, processes, etc., simply do not exist at the hardware level. These are strictly the domain of the OS, and no matter what, they are going to be doing some sort of timeslicing somewhere thread/process idle and zombie handling, etc. Not that this should all be in the hardware, of course, but maybe with the exception of some advanced supercomputer-style hardware, the CPU is not even aware that a core can run commands that are not related to each other logically at all. J.Ja

Justin James
Justin James

You've hit it on the head. One of the defining characteristics of the human thought process is its amazing ability to automatically drop the "insignificant digits", filter out the "bad bits", and generally have the kind of fault tolerance like you describe. For example, the common "teasers" where you see the sentence with the horrible typos, yet you have no problem reading it. I cannot imagine a piece of software that could work if, say, 25% of the RAM reads came back with utter garbage, but the human mind can. In fact, one thing of note is that in many cases, the human brain is better at making certain types of snap decisions than studied thought, like some danger responses. This is definitely a hardware problem, at this point. J.Ja

chad_forte
chad_forte

...have been around since the 1970's. Today, there is virtually no limit to the number of processors you want to run as a single system image on high end IBM machines.

Jaqui
Jaqui

* Photoshop * Excel (tons of hidden functionality in the Perl "DWIM, NWIS" style in Excel, beleive it or not) * Perl (yes, at this point, I consider Perl to be an "application" more than anything else, every since I stopped using it as a general purpose language a while back) * paper and pen, great combination :) * tinyurl.com (shame it is needed in the first place) * Visual Studio (most of the time, sometimes I hate it!) * SQL (the language itself, it more perfectly encapsulates its specialized problem domain than most languages do) How can these applications be improved with threading? The slow point is that the user needs to react to the output to start another action, so doing multiple actions at the same time would most likely ruin their usefullness. photoshop, changing the lighting in an image, then adding pixel depth while the new lighting data is bing calculated would cause problems with the image itself for a quick example. doing both in sequence [ either way ] stops the calculations from having a negative impact on each other. [ such as slowing the lighting calculaions to a crawl as it has to restart every time a new pixel is added to make the lighting fit on the new pixel correctly. ]

Justin James
Justin James

You got me on that one, I failed to properly qualify the statement with something like "mainstream hardware". Indeed, Cray was doing just this nearly 30 years ago, and of course, GPUs do it now as well (if you don't mind representing everything as shader operations :) ). So yeah, you're right, I was indeed being overly pessimistic there. J.Ja

rclark
rclark

They don't exist now on some boxes. On others they do. And the definition of super computer keeps changing. The point is that as time goes on, the things we take for granted we have to handle changes. Perhaps there are things hardware is too rigid to do. I haven't found it yet, but perhaps one day that thing will raise it's ugly head. Until then, if you can define it sufficiently to a gear head, they can build it. There is no appreciable difference in hardware and software, only the methods of production. So if you can program it in code, the gearheads can eventually encapsulate it in silicon and get the same functionality in hardware. If it is worthwhile, they eventually will. The real problem is that each application is so different that the return on investment for production is too low to generate a profit in making software into hardware. So what we normally get in hardware are the overreaching solutions to common problems. Not a one off designer chip that solves your particular problem. So the result is that it's not cost effective to design and build a solution. But that doesn't mean it will always be that way.

Vladas Saulis
Vladas Saulis

I agree that processes and theading are mostly at OS domain. On the other hand I can see a significant support on hardware level too. Remember 8086/80286 processors? Both had little or no support for multitasking/multiprocessing. Some CPU registers such as PSW are designed specifically to support mustitasking (and these actually have been long before x86 CPUs). In the mid 60-ties IBM have designed OS/360 series where it had been finally standartized the way we do computing until today. IBM introduced PSW, stack pointer and some other things (such as soft interrupts for paging memory). Since then, we are 'captured' for all existing programming models. The most worse of these inventions, IMHO, was stack operations and pushing this standard to all aspects of programming. I think the 'stack paradigm' is only good for sequential processing (and so it was in 60-ties). For parallel programming is does more harm than good. And most programmers today cannot even think in terms that are beneath the use of stack. Multithreading in my mind is only a pervasive way to 'imitate' parallelism, which is in most cases forced by current hardware and software techniques. I'm glad to see more and more people who just begun to realize that.

rclark
rclark

When this "Computing Thing" started, we the programmers had to control everything. As time went on, more and more has been switched into hardware, leaving us to go on to bigger and better things. That spelling garbage trick is simply first and last character correct, then it's this word. Move the first and second letters and/or last two letters and our brains scramble to define what the word is. Simple programming construct to implement. Why doesn't anyone do it? M$ has for known case errors in their spellchecker. But mostly, people do know how to spell close enough for other people. Not really close enough for machines. But the human brain does this to abstractions also. Two face vase, young old women, dancing horses/lying women, etc. all are visual abstractions that the brain can see. A simple stick box turns inside out. The back side of a transparent mask is visually inverted until you turn it past 46 degrees. All of these things are the brain filtering your input until a known connection is made and suddenly it snaps into focus. One of the neat things going on right now in robotics is teaching a robot to actually see as humans see. The camera has much better definition than we, yet we can instantly recognize objects and robots can't. It's not because the data is garbled, and it's not because the detail is missing. I am in hopes of those 64 core CPU's that are coming. When they get here, with an OS that will treat them as a single thought processor, we will have the power to do pixel associations by color recognition and finally, object recognition. Just analyzing movements now is difficult. There is a bright world coming, but it will get here by coming one piece at a time, one task at a time, one paradigm shift at a time. Everyone here, in the trenches will have to put shoulder to wheel and push. We will all have new "War Stories" about how it was back in the day before xyz came along. I don't look for massive changes, but I do look forward to continual evolution of hardware, software, and programming paradigms.

rclark
rclark

How about analyzing discrete objects in the photograph so that they can be handeled as objects. How about cleaning up artifacts in the photo without having to zoom to 1600% and manually using a pixel brush. How about a pixel that is itself a object that has more than a color value. We think of pixels as a combination of three colored dots. Why can't they be rods, circles, squares, polygons, and triangles that have color attributes, size attributes, texture attributes, etc. The list of things you can do is unlimited. How about a parralell search for objects of approximately the same item that redefines what a picture is. As each is identified, it is sharpened until it's the best available object known. So that fuzzy out of focus pictures are a thing of the past except for abstract art photos.

Editor's Picks