SYSTRAN: The Machine Translation Company Putting Data Privacy First

SYSTRAN: The Machine Translation Company Putting Data Privacy First May 11, 2018 2:41 pm
The following is an interview we recently had with Denis A. Gachot, CEO of SYSTRAN.

1. What's the history of Systran? Where and how did you begin?

DG: The Georgetown IBM experiment of the mid-‘50s brought language translation experts from around the globe together on a mission to prove the viability of machine translation. Among those professionals stood Dr. Peter Toma, a Hungarian polyglot and computer programmer. Toma had a vision for achieving peace through communication, by combining linguistics with computer science.

Toma founded SYSTRAN when the United States Air Force needed to translate mass amounts of scientific and technical documents from Russian to English during the Cold War. This was the beginning of our ongoing relationship with the US Department of Defense. We are still relied upon heavily by defense agencies around the globe for sensitive translation purposes, bridging a communication gap between foreign nations.

Our commercial work is equally high-touch, working with multiple Fortune 500 companies including Adobe, Ford and Groupe PSA. Google used our platform for their language tools until 2007 when they developed and launched Google Translate.

2. What specific problem does Systran solve? Who do you solve it for?

DG: We offer secure, automated translation to global corporations that need to understand and protect their multilingual data. Our enterprise translation server runs completely on neural networks. It seamlessly integrates with collaboration tools such as Word, Skype, PPT, and Outlook. More importantly, it can be installed securely behind your firewall and translations can be processed offline. We call it Pure Neural Machine Translation (NMT).

Our users care about their customers’ privacy and protecting intellectual property. We’re not just service providers, we’re educators. Data collection is happening by the minute and the average user is unaware of how widespread it is. More important are the vulnerabilities it creates. It’s our responsibility to outline the importance of data privacy in translation and provide a solution that mitigates associated risks.

3. How does your solution work?

DG: Our user tools are designed to keep users moving. For example, translations can be processed directly within your Microsoft applications. Mass translations of data can be processed via our enterprise server—this is common in eDiscovery, compliance, product documentation and training materials.

Within our suite of intelligent language tools—in an effort to help our customers protect their customers’ personal identifiable data—we created the Anonymizer. The Anonymizer masks personal information with symbols, such as hashtags, at symbols and dollar signs. This tool was conceptualized by ReedSmith, a top global 20 law firm.

We refer to this as the last layer of privacy. Security is meant to stop people from breaking in, but hackers will inevitably find a way through, i.e. Equifax. The Anonymizer can privatize the data in transit. You can think of your data bank like a digital safe; within it your passport, ID and other personal information. Anyone attempting to steal it would find your personal information redacted.

Imagine a large social network had their users’ data mishandled by an analytics company. If the Anonymizer was in use, those users would have been unidentifiable because their stored information would be displayed as random symbols. All we’d know is that ‘##### &&&&&&&,’ likes the color purple, action movies, went to the gym on X day and is friends with ‘%%% ((((‘.

Companies store millions of forms containing personal information of customers, employees, vendors and more. Personal data is scattered across every page. Manually sifting through and deleting this data would take years. We ease that burden with the Anonymizer which goes through swaths of documents and redacts data like a black marker. Picture government documentation with redacted information all across it, but now we’re hiding personal information to ensure data privacy.

Facebook has been in the news for this recently. Users reasonably assumed that while Facebook was collecting their data, they were the only company with access to it. What happened recently with Cambridge Analytica is evidence to the contrary. Whenever your data is collected, it is at risk of being stolen. That’s why we value data security and privacy wherever possible.

4. What are the top trends you're seeing in machine translation?


“I trust you with my life. I trust you with my identity. I trust you with my bank account. I trust you with my friends. I trust you with my preferences. I trust you with my whereabouts.”

Most of us do not realize that we’re making this judgement everytime we digitize information. Hacking seems to be a more pervasive threat in recent years but in reality, we’re making our data easier to access. Hacking is becoming more and more valuable because consumers are leaving their valuable information in unsafe locations.

‘The song you listen to on your drive to work, the movie you watched last night, the Youtube video you learned from, the chocolate you almost purchased, the emoji you just started using, the hymns you hum in your living room, your nickname.’

All of this information is valuable to advertisers, publishers, and anyone else searching for consumer data. Placing your data online is like putting your wallet on the bar top while you’re at the restaurant. Unfortunately, this information gets accessed and misused by bad actors even when precautions are taken. This is a macro problem and we are in no way solving it at this level. We are one step in the process to fixing the problem.

This behavior isn’t restricted to one generation; Boomers through Gen Z are all equally mislead and misinformed. All of the work we’re doing is with the intention of setting up future generations for success.

Data Security

For consumers, free almost always wins out over a paid product. Because online translation tools are free, most people use them. What people do not know is that free usually involves free use and distribution of any data you input. We hear people all the time say, “I don’t have anything to hide…” This attitude is based in a general ignorance to the risks at hand. With awareness, this will change. Companies can face massive fines if they don’t take precautionary measures. Translation on a massive scale collects a massive amount of data, and ensuring the privacy of that data needs to be a top priority in our industry.

New Jobs

Popular belief positions Artificial Intelligence (AI) as a job killer. It will replace some jobs, and it’s a good idea to be clear on which ones. But it will improve others and create entirely new ones. Within the translation industry, there are big data projects that were previously, financially unattractive. Now with PNMT, those projects are happening and jobs are being created to manage them. Machine translation is affecting industries beyond just translation: consider voice activation for mobile and home devices. There are jobs in those industries to manage NMT technology and as the technology expands, so will the career opportunities.

Regulations are Increasing

The General Data Protection Regulation (GDPR) of the EU is a comprehensive data protection rule requiring any companies with customers in Europe to ensure the protection of said customer’s data. The penalties are 4% of revenue or 20 million euro. With penalties this high, no company can afford to ignore data privacy any longer. The GDPR is the first of a swath of regulations to follow.

5. What's the future of machine translation?

DG: Neural Machine Translation has achieved fluency in even the most hard-to-translate languages like Arabic, Chinese, and Russian. As quality improves, the user experience innovations follow i.e. the advent of the earpiece translator and the combining of machine translation with voice commands for productivity gains. We’ll continue to see machine translation intersect with emerging technology such as augmented reality, IoT and the collaboration of geographically dispersed employees.