Wednesday, April 10, 2013

Tagonomy for Advertizers

Wherein I break down: we all consume content; we are all content; how we can all get paid for our content of likes and tags, and how this could make for an interesting new web and mobile business landscape.

What we've been doing for years is selling our tags, albeit more anonymously than we do now. When you buy a magazine, say "Dwell," you are giving up some of your time, and maybe something you value, like enjoyment or aesthetics. (My project defines a user-managed rating of posts called AdNasty, where you get to rate how much the "Ad-ness" nasties you out. So you can share your aesthetic judgement of the content relating to advertising. The idea in that user-generated-content system, everyone is pitching something, and you want to rate how nasty and obvious the pitch is.) Some of the content you will see will be observed as "ads." Some of this is paid for, some indirectly paid for. What you get in return is a lower cost for your purchase. For $US 6.00 you can buy this magazine from the rack, and it comes on nice paper, has about two good shoots, totalling maybe 30 pages of content you paid to see. But the fact is that we are also subscribers to the content known as "advertizing" because the advertizers have been subsidizing the newstand price. And this coupling of adspace and very targeted content means the tag "Dwell Magazine" is monetized.

Recent news item: Facebook performing data mining on "Likes." Duh.

We are all content now. ("Content" in the last sentence is a noun.) We sell our content in exchange for information (content) or for reduced prices on real-world things. But most of what we do online is information. You are using the web to find the bookstore that will sell you a book, a shipper that will take the book from the store to you, and the middlemen to get your order from San Francisco to Peabody, Indiana, and get your book back from there to you. Sitting in the middle of this information trade in one transaction might be Amazon, and UPS. So brokering, connections, availability, and trust are monetized. When the book itself is digital, the nature of the transaction is even more stark.

I'm suggesting that we create a means for people to categorize everything they wish to store and retrieve using a system of hierarchical tags. The premise is that these people would "garden" their tags a bit, because the tags help them remember and find things. And this system would provide a nice context-sensitive gui for tagging in place, with recall of recent tags, autosuggest as you type, etc. Because the system supports hierarchical tagging, and the ability to store lists in this tree, it supports how people think: tree, tabular, or flat. Emphasis is placed on keeping the system simple, so graphs are not directly supported (you can still have links). The idea is a system of tagging and recalling tags throughout your web/mobile experience, where the system supports everything from a simple list of commonly used tags, to trees of tags in a hierarchy such as a mid-level or power user would use.

The collection of tags becomes like a signagure of a person. Especially if the usage and application of tags is used as a learning network. If these signatures could be normalized, then they can be mined, and then they are monetizeable. So a consortium could entice users to use the tagging system by offering to pay them a portion of CPI (Cost Per Impression, typically quoted in dollars per 1000 impressions or "views" of an ad). In other words, sign up for these really, really targetted ads, and don't get any others from the ad servers. Now these really realy targetted ads are really really interesting, and they are, effectively, "content." In fact, they won't look like ads at all, they will look like offers of content that the user wants.

As an advertizer, the value, in terms of real CPI, of these users is high. Intuitively to me it seems to be an order of magnitude better in terms of response rates, when compared to ads that may be selected from keywords in my eMails. I have gardened my tags, so the tags I use most frequently have meaning to me, and characterize me better.

To normalize tags, I'm proposing a system that stores trees of tags, like a taxonomy.

I'd allow aliases (soft links) from nodes to the same tree (by ID), or external trees (HREF or XLINK).

Similarly, allow the importing and aliasing of external trees, especially from other users. Even better, encourage and promote the idea of importing well-known taxonomies. Allow there to be multiple taxonomies, but become good and correlating them, partly by encouraging linkage. The fact that there are different camps of taggers that can talk to each other, agreeing on their disagreements, is a good thing.

When a tag is used, its canonical location is referred to, along with a qualifier between 0 and 1, or between 0% and 100%. This qualifier species how true this tag is for the given application, 1% being "I kinda think there's a bit of /likes/movies/Avatar/ in this blog post, but just barely." As a user, if I qualified this blog post with the amount of each tag, then tags for this article would be:

  • 90% "/work/"
  • 100% "/projects/"
  • 100 % "/projects/blog/my-entries/"
  • 50% "/projects/tagonomy/"
Here 90% "/work/" implies that this is a project for the /work/ portion of my life, but I wouldn't want to exclude a future search from looking for interesting writing I had done under "/play/". The "/projects/tagonomy/" tag applies because the ideas here are based on my "tagonomy" project. But the tag doesn't really apply 100%, because this article goes on to talk about advertising. The percentages are the amount of a given tag that you want to sprinkle on this document. 100% means that when you are searching later, that this tag is spot-on. In the other direction, I think it may prove better to figure out which tags define "/work/" by inference and link counting. Alternatively, a user could have two sliders: 1) for how much this tag ("/work/") is "so true" for this item, and 2) how much is this item "exactly what I'm talking about" when I say "work". Either way, the system needs to calculate back-links to defintions. Using this information, and the information gleaned from links, multiple trees can be correlated into camps of thought. I was surprised, once, after using a bookmark sharing and correlating site, that when it compared my tag cloud with other users, it found matches, that showed that other people existed who not only shared interests with me, they also shared a lot of interests with me. And these were in such a tight pattern, that by judging on their links, they were just me. Guys interested in Delphi programming, bicycling, guitars, pro-tool recording gear, spanish language learning, and musical note notation software. OK, that is probably 2 million people in the U.S. But even walking into party of geeks, or a Meetup wouldn't guarantee that I'd meet any of them.

Further, there will be an algorithm for various combinations on trees. This is where things get interesting. If a product of two trees, through some inferences by using the aliases and imports defined above, could produce an affinity or correlation, then this information is fungible.

In order for a system like this to work, there must be a central or well-know way into the cloud of tagonomies. If app vendors all had immediate access to storing user tags, for example, then the tags would appear in lots of apps, be useful, and highly available. People would then begin to expect them in apps. And other portals into this highly prized data could be provided to users as part of a tag/link management service:

  • We'll help you manage your tags,
  • centralize your search
  • store pointers to your data in the cloud
  • do it securely
  • and enable you to sell this to content providers.

The fungible information posessed by users can then be brokered through to content providers (remembering that catalogs are content). I've been bothered by ads, but I have never been bothered by a catalog. Similarly, content providers always want more targeted and receptive subscribers.

As consumers of content containing ads, we are subscribers to ads. It would be really interesting to get paid to be a consumer. What I'm suggesting is that as content consumers, we have something to sell back: our tagonomies. And as a company, it would be really nice to broker that transaction.

Wednesday, December 12, 2012

2012-12-12 : A good day for a proposal for a JavaScript Validation Framework

Being a fortuitous date, 2012-12-12, I thought I'd make a fortuitous post about some work I'm doing in JavaScript. This pattern came into use while I was working with Mark Pahlavan on a design called gdef, for Gui DEFinition. We implemented gdef in Tcl (Tool Control Language). I extended the datastructure to support web requirements for form-wizard validation (groups, pages, a nextPage(page) function).

I propose that this structure, updated for JavaScript/JSON, and with some of the lessons learned from my open source web framework, "dynamide", would be ideal to transmit web service requests and futures back and forth.

All URLs are considered to be https URLs, with authentication (web-key), so authentication steps are ommitted below.

All code and structure will have a time-to-live (TTL) parameter, so should be cached aggressively.

The client requests a web service URL.

The server sends back a form to fill in.

  • Default values are in the form, correct for the authorized user.
  • Long lookups can be supported by URL placeholders, or by being tagged as auto-suggest.
  • The type of the form is given.
  • This type will have the actual validator code, so should be retrieved, used, and cached.
  • Inline validators in the form are allowed and encouraged. Where it gets clunky, move code to the validator Type.
  • All validators will work in a javascript sandbox, with a private global space [we can do this client-side, right???] (At the least, define sandbox as function sandbox(window, $, document) and then call as sandbox({},{},{}), and locking down arguments.caller, etc.)

The client looks at the form type and sees if it has form validator in cache, etc.
Client allows user to fill in datastructure, using some View.

View applies validator at will.

Client submits form to service.

Note that the service is an authorized view into the data available on the server. Just the allowed fields are allowed through. The application can send a full form, but will mark the fields as required based on the state on the server. The client app is allowed to show any of the fields and validators, but is only required to show required fields. Since validation and lookup code are provided in the form or form TypeValidator, the form can also contain calculated fields and the View can choose to display these as well.

This structure is an example, but is pretty rough.
I'll be updating it.   

Sandbox :: functions provided by the framework
function next(pageArray, page){
    if (!page || page.equals("INIT") ){
        return pageArray[0];  
    var nextP = pageArray[pageArray.indexOf(page)+1];
    if (nextP){
      return nextP;
    } else {
      return "DONE";

datastructure :: the thing that gets passed around.

FormType:  {
 pages: ["page1", "page2"],
 fieldGroup:["field1", "field2", "field3"],
 fields: [
  "field1": {type: "", 
  value: "", 
  validator: {code: "", 
  url: "", 
  registeredType: ""} 
  // Only one of these is used.  
  } ,
  "field2":{value: "3.33"}    
  return next(this.pages, page);
 errorPage(page, e){}
 validateField(oldValue, newValue){
  if (newValue....){
  return true;
  return {ok: false, reason: "should be less than 550"};
  • Work out if the url can be used as a backup.
  • Work out if registeredType (namespace) is any different from a url.
  • Rules are open and shipped in the record-type struct.
  • type: "RecordType/com.myco.RenterDatabase.LeaseForm"
  • Rules only interact with js engine and Sandbox.
  • Sandbox should have String, Math, array goodies, plus jspath, and mongo-like queries.
    • jspath
    • underscore.js
      But it would be important to get the set of functions in the sandbox right the first release.
    • LiveValidation
      This utility seems to have all the single-field cases nailed.
    • some mongo-like operators may or may not be essential. Compare with underscore.js
      tower stores
State machine:
        type defined
        code defined
        data loaded
          init code given chance to run
          nextPage(currentPage, error)
          errorPage(currentPage, error)

POST_DONE should have a message or signal that triggers a state or page from the type that shows where to go when done with this app or pageset. So the POST_DONE state would look at the signal and the type and figure out the internal page, or the external URL to go to.

errorPage should be called first, so if you want to handle both in nextPage(currentPage, error) you can say: errorPage = nextPage;

Your app can decide if errors prevent submission.

So now that we have the data structure defined, and its validator, we can run the same validation server-side, and also throw an enhanced, server-side validator at it. If we sign the struct's md5sum, or somesuch technique, we could vouch that that object passed our validation, and send the result off to some other service.

The data structure plus the data definition are then an application. They miss a view, and display rules, and i18n, but they can be driven completely like an application or a complex business object.

The page names are logical pages, so show grouping of fields, and also a namespace to pull values from for a View with subviews. Some fields are input, some fields are display only. So a view can generically or specifically represent this object using any number of templating or widget frameworks.

TODO: nextPage could contain page names or URLs, in which case processing can fly off. How to handle this is really a controller/view question, but should be written to support dialogs, etc.

Saturday, June 25, 2011

Managing Simplicity (TM)

Software moves too fast for maturity.  So software managers attempt to "Manage Complexity."  There is an entire industry built around managing complexity.  Version Control, Frameworks, Spring, bug tracking, wiki, JIRA, etc.

Managing Simplicity

The idea here is to prefer simple solutions, hoping that the simple solution will pay off in the long term by reducing current and future design and maintenance cost.
Managing Simplicity posits that this will be true enough of the time to make it a default choice.  Of course you have to look at opportunity cost, and weights of risks, rewards, and efforts.

Here are the areas where Managing Simplicity spells out steps that are proven to save effort in the long run.

The Refactoring Rule of Three.

The first time, just write the code to get it to work.
The second usage, copy the code, make it work for the new instance.
The third usage, look for commonality, and do a simple refactoring.
So any time you have four blocks of code that smell of Cut-and-Paste, look to refactor the code in the simplest way to eliminate redundancy.

When you get to three redundancies at the next level, then consider fancier refactorings.  That is, the following code is redundant, but OK, because it is only done twice:

 ContainerNode copyright = new ContainerNode("Copyright");
  copyright.setDesigner(new CopyrightDesigner());
  TreeItem copyrightItem = ContainerNode.createTreeItem(copyright);

  ContainerNode websites = new WebsitesContainerNode("Websites");
  websites.setDesigner(new ListDesigner(websites));
  TreeItem websitesItem = ContainerNode.createTreeItem(websites);

When you get three of these babies, do a simple refactoring, which is to move the code into a factory method, passing in the things that are different.

However, when you get to three different variations of how you call your factory method, look to fancier things, like Interfaces, Introspection, Spring, and so on.

Friday, June 17, 2011

Using Google Web Toolkit (GWT) for UI and RPC for the Anarchia Author interface

This is a bit of a ramble, but contains links to get started using GWT, and some notes on how to use Eclipse and IntelliJ IDEA for GWT development.

For I plan to deploy using as the templating engine for browsing all the content.  But the Author interface (that is, the interface that authors must use to upload content, manage identity, choose frame layouts, make story trails, and assign copyrights), is a real app.  It has business rules, editing of one-to-many relations, and many of the things that make application development and deployment difficult via a web interface.  I had planned to deploy this app as a pure java client using , but the tagland client still needs a heap of work.  I could use dynamide for this app, but there is a lot of back and forth to the server, and dynamide, being form-based, would make this a bit too web-1.0.

Enter Google Web Toolkit (GWT).  I decided on using GWT for the Author interface because it promises cross-browser functionality, and AJAX communication to the server.  Downsides: it is complicated; it has a slow load time; it has a number of moving parts that are inaccessible.  I decided that the slow load time was acceptable for the Author interface since it is basically an admin application--designed to be used by a limited number of authors who are motivated to use the system, compared with the browsing interface which will be used by anyone on the web.  After some initial testing, I also found that complications seemed to be in the build and deploy process, and not in the browser environment, so could be managed like any other software complexity.  I also found during prototyping that all of the abstractions I really wanted were supported.  GWT supports serializing real java objects across the wire with its own RPC mechanism, and supports construction of the UI by composition of components. Another plus of GWT, for a Java programmer, is that all the code and logic are in Java.  So you get compile-time checking of everything, strong typing, exceptions, etc.  This goes a long way towards managing simplicity.  On balance, it seemed that GWT managing of complexity was the way to go rather than managing that complexity directly with something like jQuery.

On to the things I learned while prototyping GWT.

First, you can see the actual code I used here: .  There are some dependencies, so you'll probably need to check out the whole project.  Look in module anarchiaAuthor.  This project compiles in IntelliJ IDEA 10.5.

I also got this working in Eclipse, and that version is up on the official sourceforge page for Anarchia at

Which IDE?

GWT is supported in Eclipse and IntelliJ IDEA.  Eclipse supports a UI designer tool which is very cool.  They both generate sample applications for you, which is the only way to get started.  The flavor of AsyncCallback that the use is a bit different.  Eclipse generates inline anonymous classes; IDEA generates named, inner classes.  I'm a fan of IDEA, so once I established that they both worked,  I moved on to developing all my code in IDEA.  IDEA has slightly better inspections and jumps between Java, XML, and HTML files in the project a bit better.  Eclipse generates the deployment files correctly, which is to say, all the files with MD5 sums in their names, which are browser-specific files for deploying to all supported browsers.  In IDEA 10.5 (they say they'll fix this in 10.5.1) you need to add two jars to your classpath to get this to work.  In Project Structure > Libraries > gwt-user > Classes, you need to click "Attach Classes", then point to your local GWT installation dir, where you must find validation-api-1.0.0.GA.jar and validation-api-1.0.0.GA-source.jar.  Accepting these files should add the jar files to the classes node (not to the sources node).

To get started in IDEA with GWT, I followed this screen-cast *very* closely, using the pause button frequently. . Basically, you install the GWT plugin, then create a new Java module, and then specify that it uses Google Web Toolkit, making sure to check the checkboxes that ask if you want it to create a source directory and if you want it to create a sample application.

I read the documentation here:
including the tutorial, which is very well written.

I found this article to be thorough and well written:
It also got me past one of the big humps, which was how to let GWT know about dependent modules so I could use GWT as a layer, rather than an all-in-one solution.  This is important if you want to keep from copying code or source jars around, and if you want to keep from getting everything wrapped up in GWT land so that you can't pursue other implementation strategies.  Specifically, I found this section very helpful:

There are two things going on when you import modules.  First, you have to get your IDE to know about the modules.  Second, you have to get GWT to find the source code, so that it can translate it into javascript, which is the magic that makes GWT work.

The above article shows how to do this in Eclipse.  In IDEA, you should also follow the article on how to do GWT module imports, then you must create an IDEA module in your main project, call it a Java module, and add dependencies on lib directories.  In tagland, tagland is the main project, anarchiaAuthor is the GWT module (a Java module with a Google Web Toolkit feature selected on the second or third page of the New Module wizard), and anarchia-obj is a Java module that contains just the serializable, GWT-friendly POJOs.  (POJOs are GWT-friendly if they don't import things that can't be serialized and sent to the client.  The list is here: . You'll find that things like are not included.)  anarchiaAuthor, being a GWT module, has a WEB-INF/lib directory, and that must be added in the project as a lib directory, and that lib must be added as a dependency in the module settings.  anarchia-obj, being a POJO module, must be added as a dependency in the module settings for anarchiaAuthor.

"No source code is available for type; did you forget to inherit a required module?"

You'll get these kinds of errors if you don't have the source code for the classes you want to serialize included properly in the IDE and in the imports statement for GWT.  For IDEA, you sometimes need to restart IDEA after fiddling with the GWT imports statements.

You will get all kinds of strange errors if you try to send anything across the wire that cannot be serialized and emulated by GWT.   In IDEA, be sure to look on the Modules tab of the Run window for the log.

You will also get errors if you include jars in your lib directory that have older sax parsers.  The solution is to clean out your lib directory, and add things slowly until the error appears.  These can look like this:

[WARN] Unable to process 'file:/Users/laramie/Library/Caches/IntelliJIdea10/gwt/tagland.90e8b9ba/anarchiaAuthor.4b3edb74/run/www/WEB-INF/web.xml' for servlet validation javax.xml.parsers.ParserConfigurationException: AElfred parser is namespace-aware at com.icl.saxon.aelfred.SAXParserFactoryImpl.newSAXParser( at at at"

This give you a clue that can help when figuring this stuff out.  Note the location of the WEB-INF directory.  This is the deployed location, so go in there, and look in WEB-INF/lib, and see what has been deployed.  In my case it was extra sax parsers that were incompatible with the one GWT wanted to use.

I also found this blog to have lots of info:

Monday, April 11, 2011

Of Menus and Spaces

Menus are hierarchical. Non-cyclic, they may contain aliases.

Menus are installed in a space. A menu is just a text file or collection of files, capable of storing hierarchy structure, and all data and algorithms to invoke an action based on a command chosen in a context.

Multiple menus can live in one space. Menus can call each other in the same space with a different security context than calling menus in other spaces.

The context is in the space. The context may be a sandbox with a namespace of variables and commands. But that context is in the space.

The space is just data; can also be represented as a directory of files.

Ultimately a space lives in a JVM, there is an uber-sandbox of Java class files to provide an API to the space, to the commands in the space.

A space is data. These data can be closed and exported to another machine.

So very quickly we have the problem of synchronizing spaces.

Some spaces will be very data-centric, will be about storing data, or metadata. Others will be more atuned to a user's desires. Data-centric spaces will have p2p, federated backup and sync, or publish-subscribe strategies for synchronization. The personalization spaces will have to follow the user more closely, e.g. actually stored on user's person, e.g. cell phone sdcard. Or perhaps biometric or passphrase based user identification for opening a write session to a hosted space. Or maybe, spaces allow mulitple users to log in and change a space. So you log into a space, and are perhaps simultaneously logged in to other spaces you need. Your master local session keeps track of these spaces, and marshalls data between spaces based on permissions.

All messages to menus, API objects, etc., use an XML payload input and output objects. The function name, plus the type (schema) of objects they require and return defines the API.

Thursday, March 24, 2011

Links to Java projects that might help

I wrote one of these for Vestek, but this might be useful, seeing how it is open source. Anyway, it is nice to have a utility which tells you in which location on your classpath a java Class is actually found.

This is the bomb. Archives as federated filesystems. (Zip, tar, jar, etc.) This will be needed by diorama for dealing with personal spaces. My concept of Spaces is to mount a jar file, then write to it, then unmount and share it, with whatever level of encryption. Your project's tips-n-tricks, help files, etc., would be in a public space. Your contacts would be in a private, encrypted space. Then those spaces get sync'd through your server/account, and shared as appropriate. This library, TrueZip, looks perfect for the implementation because it handles multithreaded access to the archive at runtime.

Monday, February 28, 2011

My PowerMac is now a $4000 brick

I just tried to install Java 1.6 (i.e. Java 6) on my 4 processor PowerMac, after having followed the instructions and paid $150 for a new MacOS Leopard 1.5.6 Operating System, and applied all the updates up to 1.5.8 in order, including all the Java updates from Apple. "Oooooooooh Sooorry, you can't run Java 6 because *you don't have an Intel processor!*" What!?

Without Java, my Mac is good for running ProTools, which I do. But without Java, I can not do the work that I get paid to do, which is to be a Java programmer. So this machine is useless if I want to use it for work. But also, any Java app written since 2008 has the right to rely on Java 6. So a whole raft of applications can't be deployed on a PowerPC-based Mac.

Now hold on, you say. Your PowerMac G5 is 6 years old! (Never mind that Java 6 was released the same year my G5 was.) But, I say, you haven't looked inside this machine. It easily cost twice what commodity hardware would have cost me at the time I bought it. It is made from machined aluminum. It has a liquid cooled core with a radiator. It has machined, well-designed, beautiful, expensive parts. This machine was built to last twenty years if you keep the dust out of it. And Apple built a reputation on backwards compatibility and support. Well that's all gone now, it seems. They are playing the same game Microsoft is: every two to three years you should buy a new Mac, preferably in the $3,000 - $4,000 range. Because it's better hardware. It will outlast your crappy Dell which dies after two years because of crappy, proprietary power sub-sytems. But what does it matter if the hardware lasts, if the OS doesn't get upgraded?

Also, as I see it, I can not now use my PowerMac if I want to use the best programming language in the world, that is widely accepted, and may be one of the languages that we could have standardized on. Except everyone wants to kill Java. Microsoft wanted to kill it so much that they intentionally broke their own implementation, and wrote a competing look-alike language, with the help of a hired gun: Anders Hejlsberg, the designer of Delphi from Borland, who contributed in a big way to the Java component model. Oracle wants to kill Java, too. They wanted to kill it so badly, they bought it along with Sun Microsystem, and made it even more proprietary than Sun had. Oracle allows us a quasi-open-source solution called OpenJDK which will build on Linux, but is a time-lag from the main development line. This appeases folks, and is mosly available and free, but it means they retain control. What the world needs is an open source language as robust, secure, mature, fast, and good as Java, that is not owned by any for-profit-corporate entity.

Mac wanted to kill Java so badly, they insisted that no one else could build Java for the Mac. Then they don't release security updates when the fixes are available. Then they decide to not support Java 6 on non-Intel macs. The tech support lady, when I asked her if I now had a $4000 brick, said that if I wanted to run Java 6 that I'd need to upgrade my entire system to Intel. Since when is buying a new machine called "upgrading?"

Then there's Apple's refusal to distribute Java on the iPhone. Tiny devices that are intermittently connected to network are what Java was designed for. Most phones have Java. The Android phones are making a smashing success using Java. So why doesn't Apple want to support Java on their phone?

So why do I claim that everyone wants to kill Java? I don't have a smoking gun, just this: if you have Java, it's one step closer to not needing a branded operating system. Java is a system you can run on a computer that will run many kinds of applications, and replaces many core operating system features. And does it in a particularly device-independent, OS-independent, network-aware way.

But you can't make money selling operating systems that don't have some sex appeal, and you can't have sex appeal without branding. So coming up with 5 new apps that flip pages in 3D is more important than providing security updates and version updates to core frameworks. Especially if those frameworks don't have happy little nibbled apple cores floating in a rotating 3D purple-starred galaxy.

So the Hell with Apple, and Microsoft and Larry Ellison. Software moves too fast for maturity. That's not Larry's fault. It's just that Larry, Bill, and Steve know this and have figured out how to make money on this mayhem.