Towards World Wide Web 2.0
The following is an email response to a friend at schema.org regarding an initiative called The World Wide Graph (w2g).
Thanks for the link! RSS is a great protocol (which I don't intend on redesigning or replacing) and the interfaces you linked me to seem to make great use of RSS -- I do, however, hope to push the web, in its entirety, towards a slightly different direction. RSS is a great way of subscribing to content on specific sites, not necessarily as a way to see what's happening (the derivative) across the entire web. For instance, I probably wouldn't use RSS if I wanted to see all the places across the web where a specific "tag" (e.g. a person) was mentioned. What I imagine with w2g is something much more akin to a public "facebook tag", where you can register a universal tag (across the entire web) and be notified when any website on the internet uses *your* tag, or a tag you follow. Many of the implementation details are described on https://graph.global. TL;DR -- That the universal document index (the web) and entity database should be a public and inherent property of the World Wide Web, implemented in the style of DNS, and not a centralized system maintained privately by search engines. W2g is an attempt at creating such a universal entity database. Inspiration: My central goal is to further Paul Otlet, et al's, vision and head toward an amalgamous World Wide Web (a Universal Knowledge Repository) freed of arbitrary, discrete "document" boundaries. A world wide web which isn't experienced through discrete, independent "web pages" (even if the unit of content an author publishes is still a "document") but instead the interaction and cooperation of content within them. More pragmatically, I am suggesting the emergence of a public layer over the exiting web which enables users (independently, via applications such as browsers or GNU utilities[1], in a decentralized fashion; i.e. without a search engine) to make queries which generate dynamic views/compositions of pertinent content. A world wide web wherein protected access to an underlying, versioned, universal indexing system is granted as an inherent property of HTTP itself. I think the Internet Archive has the capability of achieving this reality. And with it could come a new generation of tools for consuming and republishing this content, and for browsing, visualizing, and discovering paths through and between knowledge. And not just knowledge -- our comprehension and understanding of knowledge[2]. Dissolving the boundaries between discrete documents (via semantic tagging and publicly accessible indices), and creating a public, distributed universal entity graph are just the first steps to a much brighter future. With this foundation in place, we can begin calculating and dynamically composing paths through knowledge. We can dynamically generate curricula; responsive sequences of resources for individuals to learn new things. We can establish previously unexplored connections between disparate branches of knowledge, dynamically fact check knowledge against its neighbors, automatically infer citations, explore epistemological and pedagogical provenance of knowledge, and even craft self-organizing systems which improve our classifications and produce new knowledge over time. Toward this end, I'm working with Ted Nelson, Drew Winget (stanford digital library), the Internet Archive, and some folks at Google on planning achievable first steps. The first of which, is exploring how a *public* universal entity graph will be created using existing standards, which supports such a future. From the literature I've read and the experts I've queried, RDF (as a general concept), schema.org (as an implementation of RDF), and services like w2g (physical implementations of open/public universal entity graphs) seem to be one approach towards achieving this first step. Without context, this whole mission likely seems a far-fetched fool's paradise -- that may be so; while there remains a chance, I intend to devote my entire life towards the mission of progressing *public* systems for universal knowledge, and I'm confident over my lifetime I can move the needle, or at least continue contributing towards laying a foundation which empowers others to do so. I'm sure you're already overwhelmed by my "essay" response to your simple question (which hopefully by now I've answered): "What problem [am I] trying to solve". If you are passionate about these topics and interested in people's thoughts on the matter, I've began compiling a compendium (something of a curricula) on Universal Knowledge which (based on historical precedent) will likely outspan my lifetime -- https://docs.google.com/document/d/1upjXuPM_rVbZcFm3aqdg-T8fHCClxnfMAMVTIaqt-_4/edit# Beginnings: Originally when I started my phd program, I was working on an open conversational search engine (similar to DuckDuckGo) called GnuAsk. It addressed two main contentions with existing search engines. First, I was annoyed with being limited to a single query as a medium for interfacing with a knowledge base. At the time, every query was evaluated independently (unlike markov chains, previously searches did not inform future searches) and converging on the correct query was an arduous process. I imagined an alternative engine with which users conversed using natural language. The engine was able to disambiguate via conversation and used models to generate accurate queries based on the progression of the user's dialog. The result of each conversation was a directed chain of resources (from which intermediary answers had been obtained), and ultimately an answer to a specific question. This "list" (DAG) was then compressed into a single URL called a GNUrl (like a tinyurl) which preserved the provenance trail of the conversation and could be shared with friends as a single link. The second reason I was distraught is, search engines are able to find and compile lists of relevant "documents", but in the average case, their job stops there. What should happen is, the engine should direct the user to the exact piece of knowledge they are after, and other relevant information (they might not have known existed). To their credit, DDG Zero-click, Google Now, and others are beginning to address this concern, albeit in a centralized fashion. Centralized "portals" or services (i.e. google and gnuask) which only expose a small fraction of their infrastructure, and through restrictive APIs, are good first steps, but I now realize I was building features at the wrong level. My goal has since matured to address how we might be able to enrich HTTP as a protocol and ecosystem, rather than continue the entrenchment of these monolithic, centralized services. Someday, I hope the world wide web becomes more like a public library, where its citizens have access to the underlying indices and knowledge graphs -- after all, such sentiments were a great early inspiration for the web. When this day comes, I'll look forward to the amazing unix-style utilities which filter, map, and reduce over the internet as a whole, and which empowers everyone's computer, and all its applications, with ubiquitous search. Footnotes: [1] I wrote an essay on this called, "What the Browser is Missing" which imparts partial blame to browsers and browser vendors, but I believe an important part of the equation lies with the HTTP protocol and how we as a society have grown to (mis-) use it. [2] Imagine a utility or interface like https://math.mx, which is capable of producing a heatmap overlay to illustrates one's relative understanding / coverage of different mathematical domains).