Today, Sysomos turns 100,000,000,000.
Yes, that’s right, the Sysomos platform now lets you analyze across 100 billion conversations. And we allow you to do it within seconds. In celebration of this momentous occasion, we got a cake. Is there any better way for the world’s most advanced (and largest) social analytics platform to mark the milestone of the 100 billionth document added to our system than with a cake lit with a “1” and 11 “0s”?
And if you’ve ever wondered how big 100 billion actually is, consider this: It is 14 times the population of our planet earth, or the same as the total number of humans who ever lived. If all the documents in our collection were printed in books, it would take up 750,000 books, which if stacked up vertically would measure 60 times the height of the Eiffel Tower.
To give you a behind-the-curtain look at what goes on at Sysomos, below is an interview with our co-founder and CTO Nilesh Bansal. Nilesh talks about the technologies that help us scale to these unprecedented levels, what works, what doesn’t work, and what the future holds for Sysomos.
How did Sysomos come to be?
I started working on data management research with Nick Koudas in 2005 at the University of Toronto. Nick is an expert in the space with 20 patents and over 150 international research publications. Blogs at the time were “cool” so we decided to collect and analyze them to see what we can learn. This is before Twitter was born, or Facebook opened up to general public. As a purely research endeavor, we worked on the core technology before we incorporated Sysomos as company in September 2007.
Building on the foundations established at the University, we focused on use cases, worked on design and user experience, and comprehensive data coverage over the next couple of years. Since then, with a solid foundation and continuous innovation, Sysomos has been leading the market, and I am very happy today as we reach our 100 billionth “milestone”.
How much data does Sysomos process?
The Sysomos platform collects posts in real time from numerous sources online, including blogs, forums, news, Twitter, YouTube, Facebook, LinkedIn and many others. On average, 400 million new non-spam posts are discovered, processed, and stored in multiple copies across the storage subsystem every day. As users use our products, over 100,000 documents are read per second, at peak hours. Today we have reached another milestone as the Sysomos platform now houses 100,000,000,000 documents and lets users analyze them all using ad-hoc queries – all within seconds.
There are 400 million social conversations daily. Where will social media be a year from now? Five years from now?
Social media is here to stay. This will become much more evident as the younger generation that is growing up with Facebook and YouTube becomes adults. We are seeing the number of conversations double (at minimum) every year, and I expect we’ll reach more than a billion new public social conversations a day in the next 18-24 months.
What is the main challenge in scaling to the massive amount of social conversations?
Sysomos provides analytics on aggregated data, e.g., users can search all mentions of a company across 100 billion documents, compute sentiment for each, and show resulting sentiment trends. Hence, each Sysomos query analyzes a substantial amount of data, even if the final display contains just two numbers for positive and negative sentiment. Our aim at Sysomos is to ensure that these queries, despite the complexity of age/gender/location filters, finish processing in a few seconds and that is the most difficult part.
This is very different from challenges faced by B2C companies like Facebook or Google where they have over a billion users, but each query is simpler. Twitter’s own public search for example limits itself to only 4 days of history and does not allow complex queries. We constantly admire how the engineers at those companies manage to do so.
How do you scale? What is Sysomos’ secret sauce?
Honestly, there is no single secret sauce. It all boils down to many small, intelligent decisions where none of them individually is groundbreaking, but collectively they make the system very powerful. We have to be careful about every single line of data access code, because a single performance bottleneck can bring the whole system down. Sysomos is about a lot of persistent diligence, rather than a magic bullet.
What is the biggest challenge faced by Sysomos engineering?
The obvious one is our back-end infrastructure that manages to search through 100 billion documents in a couple of seconds. But the even bigger challenge is design. I am a strong believer in “applied analytics”, meaning instead of inundating the user with lots of numbers or data, we show them the things that actually matter. We spend more time designing user experience than anything else. The design philosophy is to be powerful but without the fuss, and that’s why our UI looks deceptively simple. The idea is to make ‘commonly done’ things super simple, while allowing for complex analyses and advanced reporting to be possible within the same framework rather than overwhelming the user with a million different options and data points.
What does the future hold for Sysomos?
That’s a loaded question. With a growing number of users on social networks and companies adopting the use of social tools across departments, there is a lot that still needs to be done. Our three big focus areas going forward will be to make sure we continue to scale up our operations as social media grows, continue to lead with our user-focused design, and most importantly, uncover real use cases as they evolve over time and respond by building technology that helps our customers do what they want to do easily. The most important ingredient of that success is feedback from our clients, and I look forward to hear from you where you would like to take Sysomos.