Tech Target: Open Source Machine Learning Accelerates Winemaking
7/11/2019 7:19:30 AMOpen source machine learning accelerates winemaking
Tech Target | By Johnny Yu | Jul 2, 2019 | Original Article
Using open source machine learning, Palmaz Vineyards developed a tool to automatically alert winemakers of environmental conditions. Soon, the data flowed like wine.
Christian Gastón Palmaz wanted to make winemaking smarter.
The CEO of Palmaz Vineyards, based in Napa, Calif., has a bachelor’s degree in business from Trinity University, but also a strong background and expertise in computer science. It led him to build FILCS — pronounced Felix — which stands for Fermentation Intelligence Logic Control System. After implementing FILCS, Palmaz Vineyards overhauled its storage and data protection to deal with the data that came pouring out.
First cobbled together in 2010 from open source machine learning protocols and consumer-grade hardware, FILCS was designed to monitor conditions within the vineyard’s fermentation tanks and alert the winemakers if adjustments needed to be made. Winemakers normally need to manually and repeatedly sample and check if the tanks have enough oxygen and nutrients and the correct temperature and oxidation level.
“I was limited to the biggest, baddest computer I could get my hands on, which was gaming hardware,” Palmaz said. “I wanted to build a system that can somehow take care of those mundane details and give the winemakers a leg up.”
Palmaz said he believes he has the world’s first fully algorithmic fermentation control system. Powered by open source machine learning, FILCS monitors the environment of the fermentation tanks and uses its associations and historical data to calculate the chance that current conditions will lead to a bad result. It then notifies the winemakers when it is confident something needs to be adjusted. FILCS is not yet sophisticated enough to make changes on its own.
Using off-the-shelf, open source machine learning protocols such as TensorFlow to build data sets, create associations, then train new associations, Palmaz spent the next four years hacking FILCS together. After buying about $70,000 of Promise Technology hardware and hardening it to work in high-moisture environments, Palmaz officially launched FILCS in 2014.
The new system churned out data faster than the vineyard could produce wine. FILCS started out monitoring six out of the vineyard’s 24 fermenters, and it was generating about 1 GB of data every hour. Palmaz decided he couldn’t handle it all on premises and spent the next year adapting FILCS for the public cloud.
“2014 was like drinking from a fire hose. I was getting worried about scalability,” Palmaz said.
FILCS was cloud-based in 2015, but it did not stay there for long. Latency when retrieving data was an issue, as fermentation is a time-sensitive process. Palmaz also said the growth of the data and the egress charges were growing to the point of unsustainability.
“To put a petabyte on the cloud is one thing, but pulling that data off the cloud was expensive,” Palmaz said.
Tasked with bringing FILCS back on premises, Palmaz started to see the limitations of his Promise storage. There was no onboard operating system to manage data, which Palmaz didn’t realize would be important to him at the time he purchased the systems. The Promise hardware had the smarts to manage the RAID setup, but he needed something that could handle data replication so he didn’t have to put that burden on his virtual environment.
After doing research and looking at NetApp, QNAP and Synology, Palmaz bought two Synology FlashStation NAS devices for his primary data center at Palmaz Vineyards and three Synology RackStation units for storing older data at an off-site location. Dark fiber connected the sites, and the vineyard used Veeam Software to back up the RackStations. The FlashStation devices used Synology backup.
Palmaz found Synology to be the most approachable of the enterprise-grade, high-IOPS hardware he looked at, because it was simple enough that he could deploy it himself. “I was intimidated by the big enterprise gear, because I felt like I needed to hire someone for implementation,” he said.
Palmaz did not leave the cloud completely, however. For on premises, he keeps seven daily backup copies on his primary site and 12 monthly captures on the off-site location. Everything older than that is put in virtual tape libraries (VTLs) on Amazon Glacier. Palmaz said he has about 280 TB in VTLs for long-term preservation.
“There’s something wrong about deleting something that you worked so hard to create. Once it’s gone, it’s gone forever,” Palmaz said. “There’s always a way to store it. For me, it should always exist.”
Although Palmaz has storage and backup figured out, he doesn’t have a disaster recovery (DR) plan that he’s confident in. If he lost his main server, Palmaz said he’d have to create a virtual environment and run it off of hardware from the secondary site. It would allow FILCS to “reincarnate” and continue to record data, but it wouldn’t have the compute power for its smarter functions.
“I do have a workflow for dealing with disaster. It’s not pretty,” Palmaz said. “Frankly, I never tested it, so it probably wouldn’t work.”
Palmaz said he would rather invest in primary and backup technology than DR. “The problem with preparedness is it requires redundancy, which doubles your footprint and costs,” he said. “I’d rather have an aggressively upgraded set of gear.”
Palmaz said he’s not done exploring open source machine learning for FILCS. Recently, he built a prototype of FILCS using Apple Core ML and plans to pit it against the current build to see if it makes FILCS smarter.
“It’s going to be a battle of the machine learning libraries,” Palmaz said.