Pages from the fire

Generative AI: Search.

Last updated: 2026-02-01

Generative AI is the end stage for search that I had not anticipated.

Classic search was one evolutionary step up from age old library research. The internet, hypertext and answer forums were another evolutionary step for our library (At least at first, eventually information began to get drowned.).

Classic search was still the library model. You had some idea of what you were searching for, you decomposed it into keywords, there was an index, the index spit out sources and then you skimmed through the sources looking for the relevant facts.

Over time the search term parsing got better and better - we moved on from keyword matching and boolean operators to fuzzy search (I still remember getting excited when search would auto correct typos!). At some point we got back an excerpt which was also amazing and helpful.

But all this was still evolutionary: the result was still a ranked list of sources.

But Gen AI search is revolutionary. We are no longer going to the library. We are asking questions of the oracle.

We ask a question in plain English, often poorly formed, formatted and formulated, and we get a neatly summary as answer with references at the end.

We’re getting the index but we’re getting a summary with it. The oracle.

When this AI summary first rolled out it wasn’t very good. I would skip past it like I would skip past the other crap that began to appear at the top of search results around 2010.

In less than 6 months something changed. The summaries got more reliable. The hallucinations got fewer (If this had been a human librarian we’d say the librarian mis-remembered less or made fewer mistakes). And, we also got the citations so we could cross-check.

Recently I asked two fairly esoteric questions of an LLM powered search at work.

The first was: “What are the cpu architectures more recent than Skylake?” The engine gave me an accurate list of Intel and ARM micro-architectures that was correct. It also didn’t get tripped up by my misuse of “CPU architecture” when I meant “CPU micro architecture”.

My other question was: “What the last branch record depth was for Cascadelake?” Its answer was “32” and it pointed to Intel’s documentation for that processor family. I knew this was true of Skylake, and it seemed reasonable for Cascadelake not to regress, but I wanted to be sure.

Unfortunately all my traditional searches through that document failed. Finally a human expert confirmed this answer and acknowledged that it is was “Weirdly hard to find.” being buried in the specifications PDF that the LLM linked me to.

The Oracle, of course, is not literal magic. It’s some tenth level technological wizardry, but so is the wheel. It’s still subject to GIGO. If we feed it nonsense posts and information, it will spit out nonsense. If we feed it poorly written English, it will spit out poorly written prose. But that is the same as search.

The Oracle is a much better search, and is much much closer to the ideal of knowledge retrieval.

Knowledge silo-ing, ownership, free internet etc.

We will eventually have to sort out the messy social questions of who owns the data and the “service.”

To me the development of the internet was amazing. No matter your circumstances you could read the thoughts of anyone else in the world. You could avail of the expertise of top people in their field. It really was the next step of the printing press.

I could go to YouTube and discover how to replace my toilet’s broken valve. I could go to stackoverflow and learn the magic incantation to make Ghostscript reduce the size of my PDF.

Now the Oracles have digested this information. But the Oracle isn’t quite open. And how long will the Oracle be free? And if someone makes money of the Oracle, shouldn’t the person with the expertise make some money?

The last question is not so hard. Most of us who answered questions on Stackoverflow and put up YouTube videos didn’t really expect payment, except perhaps for a bit of internet fame.

Perhaps the next Stackoverflow/YouTube (DIY edition, not the TV edition) will be a site that explicitly promises to have a perpetually free to use Oracle based on the content we give it for free. That was kind of the social contract with SO and YouTube.

I don’t know how this will evolve, but I think it will follow the classic curve we’ve seen with search engines and forumes and code forges: It’s free at first, then the (new) owners try to make money, they make it terrible, a new service springs up, everyone moves there, and the cycle repeats.

It ain’t perfect, but it works.