Amit Vij, Chief Executive Officer and Co-Founder, Kinetica
Amit is a co-founder, board member, and CEO of Kinetica. Amit is responsible for the vision, administrative, and executive decisions for the company. With a background in computer engineering, he has over a decade of software development experience in the commercial and federal space, with an emphasis in analyzing and visualizing big data, and helped architect Kinetica. Amit served as the chief GEOINT technical architect as a contractor for a major Top Secret cloud initiative between the US Army, NSA, and the DIA. Prior to Kinetica, Amit was a subject matter expert on geospatial intelligence with General Dynamics AIS and had been chief architect for several Department of Defense and Department of Homeland Security contracts. Amit received a B.S. in Computer Engineering from the University of Maryland with concentrations in Computer Science, Electrical Engineering, and Mathematics.
Could you tell us a little about Kinetica in your own words? What “pain point” are you looking to solve?
Kinetica is a distributed, in-memory database accelerated by GPUs. Kinetica gives unprecedented performance to ingest, analyze, and visualize data for real-time insights and actions. Kinetica converges machine learning, deep learning, natural language processing, and location-based analytics in an easy-to-use, SQL-compliant relational database for 100X query performance improvements at 1/10th the hardware.
Originally used to power game consoles by accelerating graphical rendering on machines, Google, Facebook, Amazon, Microsoft and Tesla are popularizing GPUs in the datacenter and the cloud. Each of these companies has drastically increased their investments in GPUs over traditional CPUs, proving their value for corporate computing. What’s driving this move to GPUs is streaming analytics, deep learning, and Artificial Intelligence led by an explosion of data and data types. Like graphics acceleration, analytics acceleration is all about math, visualization, and massive data quantities. Throw in Machine Learning and Artificial Intelligence and it’s easy to see why GPUs are putting a spotlight on the limitations of CPUs and will ultimately dominate every enterprise. While GPUs are the focus of the new datacenter, it will be a database like Kinetica that uncovers the real benefit of their speed with an easy-to-use data management solution for fast OLAP, convergence of AI and BI, and location-based analytics.
How did you get here with the company?
In 2009, the United States Army Intelligence and Security Command (INSCOM) at Fort Belvoir sought the capability to track terrorist and other national security threats in real time. The solution needed to produce instant results and visualize insights across massive streaming spatial and time-series datasets. Nothing existed in the market that met their needs. Data warehouses were too slow, NoSQL wouldn’t scale, Hadoop was too complex, and premium in-memory solutions promised real time but didn't deliver.
After extensive testing and research revealed no existing systems capable of meeting the Army's needs, myself and now Kinetica’s Co-founder and CTO Nima Negahban built from the ground-up a new database, centered around massive parallelization utilizing the GPU, to explore and visualize data in space and time–an approach that has been patented.
After extensive research and development, the new database, known as GPUdb, was launched at global scale, ingesting and analyzing over 200 different data feeds from drones and mobile devices, to track terrorist activity. With the growth of data from IoT, transactions and other sources, businesses users started to run up against the challenge of streaming and analyzing data in truly real time so we commercialized the product. In 2014, USPS deployed GPUdb into production to optimize routes and increase accuracy. Since then, we have rebranded the product and the company as Kinetica and continued to heavily invest in R&D, sales and marketing.
Today, we have the fastest performance of any GPU-powered database with 100x+ gains over traditional RDBMS/NoSQL/In-memory DBs on 1/10th the hardware with successful production deployments in gov. and enterprises for immobilizing terrorists; real-time location-based analytics for route optimization; smart-grid infrastructure management; genomics research; recommendation engines in retail; and real-time risk management.
Why GPUs? Why is that more often effective than more traditional methods?
GPUs are capable of processing data up to 100 times faster than configurations containing CPUs alone. The reason for the improvement is the massively parallel processing, with some GPUs containing upwards of 4,000 cores— many orders of magnitude more than the 16-32 cores found in today’s most powerful CPUs. The GPU’s small, efficient cores are better suited to performing similar, repeated instructions in parallel, making it ideal for accelerating the compute-intensive workloads that will characterize the Cognitive Era. GPU’s deliver better performance at lower costs enabling enterprises to consolidate, streamline, and simplify IT infrastructure and do more with less.
How is Kinetica so amazingly fast? What's the difference compared to your competitors? Can you tell us how it works?
There are several main primary benefits and competitive differences of Kinetica’s GPU-based architecture: First, we offer unprecedented performance—Kinetica can ingest streaming data while delivering analytic results and producing visualizations on that data within milliseconds. Kinetica is designed from the ground up to take advantage of GPUs, which have 4,000-plus cores per device, versus 8 to 32 cores per CPU-based device. GPUs are well suited to performing repeated similar instructions in parallel for the compute-intensive workloads required of large data sets. GPU cores crunch data far more efficiently and quickly than CPUs—which process data sequentially.
Second, Kinetica scales predictably: you can easily scale up or out. Data written to Kinetica is automatically routed in parallel to nodes across the cluster, and OLAP queries are executed using fully distributed GPU-accelerated processing across the cluster. Kinetica’s vectorized kernel and data structures are optimized for compute-intensive operations typical of analytics and machine learning workloads.
Third, Kinetica is an in-memory database that removes throughput bottlenecks by managing data in-memory. Kinetica maintains data in GPU VRAM and system memory with persistence on disk to remove I/O bottlenecks and deliver millisecond query response.
Additionally, with User-Defined Functions using GPUs, Machine Learning/AI libraries such as TensorFlow, BIDMach, Caffe, and Torch can run in-database alongside, and converged with, BI workloads. Kinetica can easily plug into existing data architectures, and the ops team doesn’t need to spend much time tuning, indexing, or tweaking their system compared to traditional CPU-based solutions. It’s also easy to consume data, since we offer free-text search, a native visualization engine, and plug-ins with BI applications such as Tableau, Kibana, and Caravel.
Which industries can be helped most by what you're offering?
Any organization that has use cases in real-time OLAP, convergence of AI and BI, and location-based analytics would benefit from using a GPU-accelerated database. These types of use cases include sentiment analysis, anomaly detection and fraud prevention, resource allocation on the fly, terrorist tracking, energy generation and distribution optimization, inventory tracking, recommendation engines, and customer experience. Businesses across a wide range of industries can benefit: energy, telco, finance, retail, and healthcare, to name a few. In finance, for example, Kinetica helps companies get faster and more flexible insight into financial operations, customers, and markets for fraud prevention, risk management, and compliance. Retailers use Kinetica to track and analyze huge volumes of moving assets and inventory. Healthcare organizations take advantage of Kinetica’s brute-force computer power for patient monitoring/care, medical research, and billing/customer satisfaction improvement.
What is the importance of using AI / machine learning with large datasets?
AI experiments have made it out of the lab and are now being deployed into the real-world by leading companies. To illustrate, Amazon has smarter buying recommendations, Facebook has automatic facial recognition, and Netflix offers movie recommendations. There are automatic assistants (Siri), for example, and self-driving cars (Tesla).
Businesses are looking to AI to improve customer experience, operational efficiency and increase sales. Opportunities abound for AI, such as for making better buying recommendations for online retailers, judging the risk of various trade decisions at a bank, accelerating drug development in life sciences, or smarter planning for logistics companies.
And, deploying AI in business isn’t easy. Data scientists typically rely on costly, complex, and specialized tools and hardware. They frequently need to copy large volumes of data into these specialized environments in order to build and train their models. Then, once the AI models are working in the lab, it can be difficult to make that functionality available to business users. Moreover, it’s difficult to find, train, and retain the people with the right skills to develop, deploy, and manage AI models.
For companies who are only now beginning to harness the power of big data, what advice would you give? Where should they start?
To harness the power of big data, companies must start with the end goal and walk backwards.
Are you working on fast streaming data or slow data? These things affect what technologies you choose.
Start by defining the business objectives such as reducing customer churn, better customer experience, improving operational efficiency, and eliminating fraud, waste and abuse. Think about data and analytics required to measure and improve the processes that drive the business outcomes. Finally, put the technology systems in place to manage big data and make actionable insights pervasively available within the enterprise. Over the years, we have worked with numerous customers who have successfully navigated the big data projects and I want to share few best practices:
- Start Small: Pick a pressing problem or business objective and focus on it to get a quick win and drive adoption.
- Fail Fast: Implement systems and processes that help you experiment and iterate quickly.
What advice would you give to a budding entrepreneur who wanted to get involved in the data industry? What about AI and machine learning?
Fierce market out there so make sure you build something that has some major distinguishing factors and has a good value proposition to the market.
AI and machine learning are hot emerging technologies and this is making its own market where companies are starting to pay for these technologies. This is an excellent area to develop something new and cutting edge.
- Build cognitive data products, applications, and services that converge data management, analytics, AI, algorithms, and visualizations for faster time to value
- Capitalize on trends such as cloud, mobile, digitization, AI, IoT
- Build standard-based, open, pluggable solutions that are easy to integrate, extensible, and preserve existing investments
- Build solutions that simplify the AI and machine learning pipelines and make these techniques accessible to line-of-business
Where do you see the crossover with AI / big data / and IoT in 10 years?
Data is the common thread that connects AI, big data, and IoT. I see these areas converging together with enterprises deploying data-driven cognitive IoT applications that connect people, processes, and devices. Data will freely flow from edge to cloud in real-time and AI and algorithms will provide rich, contextual insights to sense, understand, and respond to business opportunities.
What are your thoughts on AI? Do you believe the fears that many people express over artificial intelligence are warranted? Are you worried about humanity’s future alongside AI?
I see AI based cognitive applications complementing humans to deliver:
o Higher combined value
o Remove complexity
o Taking on routine tasks and freeing humans for high quality, high value work
Don’t really believe in fear, but information privacy from many of the vendors is real. Google, Apple, Amazon, have all kinds of data on us now and we need to be careful on how these guys use that information. On government projects, we have very fine and defined laws to protect US citizens on this where I believe a great degree of reform needs to be applied to the commercial enterprises.
Speaking of technological concerns, we are seeing a lot of cyberthreats lately like Brickerbot, WannaCry, and, of course, the Mirai botnet is all but forgotten. What makes Kinetica's security different and safer for its users?
Kinetica was incubated as a US Army and NSA project with strict security requirements. It features comprehensive across the stack security features including:
Security is a core tenant of our product once it was created which years of engineering applied to Kinetica. Row and cell level security.
- Authentication: LDAP authentication with support for Microsoft Active Directory, and OpenLDAP.
- Authorization: Role-based access control with users, roles, and permissions.
- Encryption: Full encryption for data in motion and at rest with SSL/HTTPS, PLS, AES-256 and more.