Fans of the popular TV show “Mr. Robot,” which dives deep into the world of shady hackers and the Dark Web that lurks beyond its better-known counterpart, take note: An Israeli startup is serving notice that the hidden is now visible and even your bitcoins won’t shield you from the long arm of the law.
Just as Google crawls the World Wide Web to index the sites we want to find, Bnei Brak-based Webhose has developed a technology that can scour the anonymous part of the Internet accessed through a special browser called Tor. Webhose can even bypass the Captcha systems meant to keep out creepy crawlers.
Law-enforcement agencies, cybersecurity companies and large financial institutions have all expressed interest, Webhose CEO Ran Geva tells ISRAEL21c. That’s not surprising: The Dark Web has long been a sanctuary for those dealing in all things illegal: drugs, weapons, money laundering, pornography and more.
It’s an area ripe for innovation and Webhose has taken the lead.
“There is no Google for the Dark Web,” Geva says. “No one is doing search engine optimization there because the sites on the Dark Web don’t necessarily want to be found!”
Webhose has employed what Geva calls “ethical hackers” – experts hired by companies to find vulnerabilities in their sites — to help get behind the Dark Web’s Captcha protection.
Geva has a warning for criminals using cryptocurrencies: You cannot really hide. Bitcoin, the most popular cryptocurrency, is actually “less secure than a bank,” he points out.
“All transactions are recorded on a ledger that’s shared openly across the network. It’s anonymous, but if you put your Bitcoin wallet address on the visible web – perhaps for a completely different, even legitimate reason – or if you share some identifying detail like your ICQ, Twitter or Facebook handle, we can see what you did with your money. Law enforcement can then ask for the IP of the user that posted and make the connection.”
Webhose is not just a science project. The company is earning more than $250,000 a month with a profit margin of close to 50 percent. Webhose clients include, for instance, a fintech startup in France and Luxembourg, a US startup that detects fake news and a cryptocurrency compliance watchdog.
The nicest company that crawls the Web
Most of the company’s revenue is not from the Dark Web. That tool was unveiled only last year. Webhose spends most of its resources searching the “visible” Internet. “We are the nicest company that crawls the Web,” says Geva.” We comply with everything; everyone can see what we’re doing. On the Dark Web, though, it’s a very different story.”
Webhose doesn’t do anything with the data it finds. “We focus on content collection,” Geva explains. “We let our clients analyze the content.”
That partly explains the company’s odd name. Twitter calls the stream of tweets that come from its platform the “firehose.” Webhose applies that same concept to data it collects from the Internet, creating what Geva dubs a “firehose for the web.”
Geva’s previous company, Omgili (Oh My God I Love It) was a search engine for message boards, an area where Google wasn’t so successful back in 2007 when Geva, now 41, started Omgili. As the initial crawler technology evolved, Geva imagined using it to track publicity campaigns on the Internet. He even built an app to do so, but it wasn’t reliable enough.
Geva kept plugging away until the technology worked. This is what enables Webhose to deliver highly structured data to its clients, “much more than the title, snippet and summary Google provides.”
Webhose organizes message-board posts, blogs, news, reviews, ratings and stock data, plus it has historical pricing information on some 20 million products, making it an ideal tool for e-commerce businesses and banks looking for patterns to ferret out fraud.
Webhose stores everything it’s collected in machine-readable format on its own databases. Customers tap into that database by creating an XML or JSON feed. Webhose has a nearly four-year archive of millions of websites, although the company only keeps 30 days’ worth of data live on its system, in order to keep searches snappy.
Webhose is not the only company structuring data around the web. Silicon-Valley-based Import.io uses machine learning to return relevant results to customers. DeepCrawl, which has its headquarters in London, is popular with SEO providers.
Webhose sells the data it collects on a freemium basis. A student, for example, can make “up to 1,000 API calls a month, which can return up to 100 results each,” Geva explains. “So in theory you could get access to up to 100,000 articles or message-board posts for free. After that pricing starts at $50 per month. We have tens of thousands of free users.”
The one area where Webhose is not free: searching the Dark Web. So, can Webhose keep a real-world Mr. Robot from hacking the world’s banks and ushering in financial ruin?
“We love that show,” says Geva. “But it’s much crazier than what we’re doing. Still, it feels good to be useful on that front.”
For more information, click here