A Timeline of Recent Search Engine Events
(Or as My Father Would Put it, Where Did My Google Go?)
by Kenova Ceredo (kenova.ceredo@protonmail.com)
What follows is a timeline of events that is factual to the best of my knowledge.
The timeline itself contains no opinions, although there may be some in the conclusion. Most of the events happened in late 2015 and 2016, but there are also some details about the end of the Gigablast search engine, which happened a couple of years later. A few of these events are very obscure, even though they may have had a large impact. I hope you enjoy reading it, and perhaps learn something new.
By the way, if you find any of this interesting, you are allowed to take a picture, scan, or screenshot of this article and share it willy-nilly around the Internet, or as a facsimile transmission on the HF bands. 2600 Magazine is okay with that too, because they printed this permission notice. If, however, you don't find it interesting, you aren't even allowed to read it aloud to other people, and you are encouraged to forget about it completely.
2015 - 2016
July 1, 2015: The company behind the Gigablast search engine announces that they have entered a partnership with the Internet Archive, and are going to use their technology to index the Internet Archive's vast collection of archived web pages. At the time, the Internet Archive had about 485 billion pages in its collection, and the plan was to index it in order to make "the biggest search engine ever created." (www.gigablast.com/blog.html)
Sometime Between November 13-26, 2015: Gigablast removes their announcement about the Internet Archive from their blog page. As far as I can tell, the Internet Archive never made an announcement about the agreement. There is no more news about this, and apparently "the biggest search engine ever created" is canceled. (gigablast.com/blog.html)
February 2016: Google decides to phase out the "Google Search Appliance" product, which was essentially server hardware running Google's software that allowed the owner to index and search though large document collections. The largest model could store and index up to 100 million documents.
Sometime in Early 2016: Google starts limiting all searches to about 400 results, which is about 40 pages. Before this, the limit was about 700 results, and a while before that it was about 1000 results. This is according to anecdotal evidence from a "Diamond Product Expert" on support.google.com. (support.google.com/websearch/thread/25885806/header-indicates-thousands-of-results-but-only-110-are-shown)
As of early 2025, all search engines that I know of limit the number of results that you can see. Some engines like Bing and Mojeek deliver about 1000 results before stopping the user from seeing more, but most of them deliver a lot less. Brave Search, for example, only delivers about nine or ten pages, which translates to about 170 results. Mwmbl.org (a small project with only about 600 million pages in its index) only returns about 80 results. To be clear, these are limits that stay in effect for very broad searches, like "cheese" and "pancake". Unfortunately, I don't have any dates for when the limitations on search engines other than Google started.
February 6, 2016: All Seeks nodes become unusable. Seeks was an open-source metasearch engine that re-ranked results based on user activity.
April 2016: Microsoft starts to open-source BitFunnel, which seems to at least be a large part of, but might actually be the entirety of, their indexing system for the Bing search engine.
April 2016: Sylvain Zimmer starts the Common Search project (commonsearch.org) that was to mainly use Common Crawl data in its index. Some ranking data from this project was used by Common Crawl (mentioned in this post: commoncrawl.org/blog/august-2016-crawl-archive-now-available), and Greg Lindahl (who is now CTO of Common Crawl, but didn't appear to be at the time) was listed as an advisor. There are a couple of small search engine projects still running today that use Common Crawl data: alexandria.org and chatnoir.eu, but the latter only uses two crawls from Common Crawl, and the former appears to draw from a similarly small index.
Two crawls is a tiny fraction of what Common Crawl has available. I do not know if the plan was for Common Search to use the entirety of the available Common Crawl data, but if they had, and if they were still running today, they would have about 250 billion indexed pages, which would put them in the same league as Google and Bing. Some of the pages from commonsearch.org are available on the Internet Archive, but a more complete and current archive of the site is available on Github: github.com/commonsearch/cosr-about/tree/master/content
August 2016: The last blog post for Common Search is posted. No more work is done on the search engine past this point. There is currently no mention of it on the founder's website (sylvainzimmer.com)
October 2016: Microsoft appears to abandon work on open sourcing BitFunnel. No more blog posts are made about their progress past this point, even though no official statement has been made about canceling the project. A paper about BitFunnel is published in August 2017, but it does not appear that any work is resumed on the project. (bitfunnel.org)
End of Gigablast, 2018 - 2023
January 2018: gigablast.com/faq.html is replaced with a blank page. Before it was blank, the page detailed some features and technical specifications of the search engine, and explained how to install the Gigablast software on your own computer and get it up and running, in order to have a local instance of the search engine with a personally constructed index. If you visit an archived version of the page on the Internet Archive, you can actually still download and install the linked Gigablast binaries, because they were archived as well. I have tested the Debian/Ubuntu 64-bit version, and it seems to work well except that it doesn't support SSL, so any pages that require HTTPS will not be crawled. I have been told however that the original version of Gigablast ran behind an Apache2 reverse-proxy, which took care of SSL. I have not attempted to set up such a system myself, but it may be a fun project for the interested reader.
Sometime Between February 26 and March 9, 2022: A small message is placed at the top of the Gigablast home page and blog page, which consists of the words "F*ck all dictators!" beneath the United States flag. Sometime between March 19 and March 23, the words change to "No more dictators!" Sometime between April 6 and April 28, the flag goes away and the words are replaced with "sudo rm -rf /dictators". Sometime between September 11 and October 30, 2022, the message disappears and doesn't come back. I am not sure why these messages were put here, but perhaps they can fuel my readers' speculations as to why Gigablast eventually went offline.
Early April 2023: gigablast.com goes offline completely and permanently with no announcement.
I don't know why most of these things happened. I don't know why many happened around 2016, or if that is just a coincidence. I don't know if any of these people ran into technical issues, or legal issues, or what. All of the outcomes of these events concern me however, and I would like to know more about all of them.
If you have any comments, facts, theories, hints, tidbits, stories, insights, or anything that seems remotely related to this subject and might be slightly interesting, I want to know about it. Heck, even if it seems unrelated and boring, but this article reminded you of it, let me know. Please contact me at the email address specified under the title of this article, or let the whole 2600 community know about it by sending a letter to the editor. Or do both. You can use an anonymous remailer (if you trust any of them) if you are concerned about identity.
Anyway, I hope I added to your knowledge, and I hope you're healthy and having a great time. I feel an urge to toss around vague statements, so I'll say this: Sooner or later, somebody needs to do something that makes stuff happen. Maybe a lot of people need to do stuff before something actually succeeds. Maybe one of those people could be you. If you try anything, I'd like to hear about it.
For Debian/Ubuntu Linux: 1. Download a package: Debian/Ubuntu 64-bit (Debian/Ubuntu 32-bit) 2. Install the package by entering: sudo dpkg -i <filename> where filename is the file you just downloaded. 3. Type sudo gb -d to run Gigablast in the background as a daemon. 4. If running for the first time, it could take up to 20 seconds to build some preliminary files. 5. Once running, visit port 8000 with your browser to access the Gigablast controls. 6. To list all packages you have installed do a: dpkg -l 7. If you ever want to remove the gb package type: sudo dpkg -r gb