With another year almost behind us, it’s time to sit back and consider what we’ve just been through. It’s been another active 12 months in the big data space, with plenty of news for the intrepid big data reader.
We’ve had an eventful last year here at Datanami, which will soon complete the transition to BigDATAwire (keep your eyes out for that change in January). With that in mind, it’s worth taking a look the top stories in each of the past 12 months. The rankings are according to pageviews.
January: All Eyes on Snowflake and Databricks in 2022
The new year kicked off with a lot of anticipation for what Databricks and Snowflake would do. The two companies did not disappoint, with a host of new capabilities and continued strong growth (although the much-anticipated Databricks IPO never materialized). These two data giants will be interesting to watch in 2023 too–although it will be tough to cover their respective user conferences in June, which take place the same days (with Databricks in San Francisco and Snowflake in Las Vegas).
February: Snowflake, AWS Warm Up to Apache Iceberg
Apache Iceberg–the new open table format that solves a lot of consistency problems in big data lakehouses–came on strong in late 2021, and its usage grew through 2022. We named Ryan Blue, the co-creator of Iceberg, as one of our people to watch. Databricks, for what it’s worth, announced support for Iceberg later in the year (it also open sourced its Delta table format, providing competition to Iceberg, along with Apache Hudi).
March: Home Depot Finds DIY Success with Vector Search
Vector search was one of the most compelling new technologies to find traction in 2022. We got an inside view of how the technology (often deployed using vector databases) helped home improvement giant Home Depot supercharge its customers’ Web and mobile searches by using neural networks to infer what they’re looking for instead of a maintaining a massive dictionary of commonly misspelled words.
April: The Modernization of Data Engineering at Capital One
Democratization of data science and data analysis may be the goal, but data engineering is often the path to get there. The folks at Capital One realize this, which is why the company has poured resources into data engineering to streamline access to data. It’s internal data marketplace combines a data catalog, an automated data pipeline development tool, data governance, and data quality, and it’s held together with a fine data mesh.
May: Anaconda Unveils PyScript, the ‘Minecraft for Software Development’
Python has become the lingua franca for data science. That’s not news. But with Anaconda’s new PyScript, which CEO Peter Wang unveiled at the PyCon 2022 conference, the company helped to lower the barrier to developing data science application in the comfort of a Web browser.
June: EMR Serverless Now Available from AWS
Apache Hadoop has long ceased being the center of gravity of the big data world. But Hadoop’s legacy lives on, including at AWS, where its Amazon EMR offering continues to be a smash hit among customers using Apache Spark, Apache Flink, Apache Hive, Presto, and even MapReduce code. And with its new serverless option, Amazon EMR (which used to stand for Elastic MapReduce but doesn’t officially anymore) helped to eliminate one of the big usability hurdles that afflicted that old elephant Hadoop.
July: Mathematica Helps Crack Zodiac Killer’s Code
Sometimes, stories languish on Datanami for months before readers finally realize what they’ve missing. Such was the case with this January 2022 story, which described how a trio of men from Virginia, Australia, and Belgium used the Mathematica statistical package from Wolfram to crack the Zodiac Killer’s code. Discover Magazine gets credit for first reporting this story. Unfortunatley, the identity of the Zodiac Killer, the serial killer who terrorized Northern California more than half a century ago, remains unresolved.
August: Datanami People to Watch 2022
We first announced the 12 Datanami People to Watch back in February, and ran interviews with the group over the course of the year. It’s a great group of leaders, including Yu Xu (TigerGraph), Lauren Woodman (Datakind), Venkat Venkataramani (Rockset), Adam Selipsky (AWS), Matthew Scullion (Matillion), Satyen Sangani (Alation), Andrew Ng (LandingAI), Tristan Handy (dbt Labs), Susan Gregurick (NIH), Zhamak Dehghani (Thoughtworks), Joy Buolamwini (MIT Media Lab), and Ryan Blue (Tabular). Keep an eye out in early 2023 for the next batch.
September: Walmart Gives Data and Analytics Monetization A Try
As the world’s largest retailer, Walmart knows a thing or two about selling. With the launch of its new Walmart Data Ventures arm earlier this year, the company launched new offerings in its Walmart Luminate line, such as Shopper Behavior, Channel Performance, and Customer Perception. The retail giant is not only selling to its partners data about its store sales (2 billion market baskets per quarter, the company says), but selling them prepackaged analytics insights, too.
October: Data Mesh Vs. Data Fabric: Understanding the Differences
There’s no denying it: Data fabrics and data meshes are hot. There’s also no denying that there’s a lot of confusion around these two concepts, which share some similarities but also have important differences. This article, which was published in October 2021, took a year to become the most-viewed story for a month, showing just how much demand there is for informaiton on data meshes and data fabrics. It just happened that it took a year for it to bubble up to the top. Expect more interest on data meshes and data fabrics in the new year.
November: What Does Data and Analytics Need for 2023? Forrester Shares Predictions
Up to this point, Datanami had one ironclad rule: No new year predictions stories before Thanksgiving. (It was the only way to keep the PR people at bay.) For whatever reason, we broke the rule this year when we interviewed Forrester analyst Kim Herrington and published her analyst team’s predictions for 2023, and the result was the top grossing story for the month. Go figure.
December: UC Berkeley Launches SkyPilot to Help Navigate Soaring Cloud Costs
One of the biggest emerging trends in 2022 was the rising costs of cloud computing. The folks running the computer science program at UC Berkeley realized this, which is why they created Sky Computing as the follow-on to RISELab (which succeeded AMPLab). One of Sky Computing’s first creations is Sky Pilot, which lets users run batch machine learning workloads on any cloud. There’s no telling whether it will be as incredibly successful as Ray, which came out of RISELab, or Spark, which came out of AMPLab. But considering the attention staff writer Jaime Hampton’s story received, we’re not betting against it.
That’s it from us this year at Datanami. Happy holidays, and we’ll see you back here in 2023.
Alation, Anaconda, AWS, Databricks, DataKind, dbt Labs, Forrester, LandingAI, Matillion, MIT Media Lab, Snowflake, Tabular, TigerGraph, Wolfram