As we discussed in the previous post, the four aspects of machine learning history is not independent: if software and hardware are not available for certain theoretical models, such models can only be studied as proof of concepts without broad adoptions or significant business impacts. In this post, we continue to share some stories on the software aspects of machine learning development.
The currently most popular machine learning framework is deep neural network, which can trace its roots back to the 1950s and 1960s. However, in the first few decades of AI development, much more efforts focused on rule-based systems. The rules can be derived from data with methods such as decision tree, but also quite often directly translated from domain experts. For instance, to develop an AI solution to support a bank loan officer who evaluates and approves loan applications, we can collect and analyze the actually decision-making processes of many loan officers. Then, the human expert processes can be generalized and programed into a computer system to automate the evaluation and approval of future loan applications.
Is there something wrong with this example? Clearly, this is not the same with what we have been discussing in class. The objective of machine learning is to discover the association between input (loan applications) and output (loan evaluation and approval), without being explicitly instructed how the output is derived from the input. Therefore, extracting relevant information from the input and learn its association with the output has been the mainstream of current machine learning research.
Then why the early AI expert systems had to rely on expert knowledge? The simple answer is that it was much easier to implement with the software, hardware, and data available during that time. We will discuss more issues on the hardware and data aspects in future posts. Now let’s focus on the software needed to develop machine learning and AI systems.
With limited data availability and computing resources, the early AI developments were largely custom-coded in higher-level languages such as Lisp or Fortran, relying on small datasets (such as expert knowledge). Indeed, standard AI software packages or programming libraries were rare before 2000s, and many researchers had to write highly specialized implementations for machine learning algorithms.
In early 2000s, open-source software tools for data mining and machine learning began to appear. Weka, for instance, was released in 1999 by the University of Waikato in New Zealand, offering a broad range of algorithms in a user-friendly Java framework. Weka became popular among researchers, students, and business intelligence projects. In 2005, Weka received the SIGKDD Data Mining and Knowledge Discovery Service Award. Btw, SIGKDD is one of the leading conferences in data mining and machine learning research, and I try to attend this conference every year.
As data volumes grew, Python started to gain traction in the machine learning community for its readability and extensive ecosystem of numerical libraries, such as numpy, scipy, matplotlib, scikit-learn, pandas. Meanwhile, the mid-2000s and early 2010s saw an explosion of libraries for statistical computing, such as R packages for machine learning. These libraries nowadays are go-to toolkit for not only machine learning researchers working in academia but also industrial practitioners like data scientists and machine learning developers.
The success of deep learning in the 2010s ushered in GPU-accelerated frameworks like Theano, TensorFlow (launched by Google in 2015), and PyTorch (developed at Facebook and released in 2016). These frameworks enabled researchers and practitioners to build, train, and deploy increasingly large neural networks, significantly accelerating innovation in areas such as computer vision, natural language processing, and speech recognition. As compute resources continue to grow, these tools make it easier than ever to experiment with new model architectures, handle massive datasets, and integrate machine learning solutions into real-world applications.
Will we still use these software stacks in 2030? If history is any indication, we have to be prepared to always learning something new. Clearly our current software frameworks, regardless how popular they are now, have various limitations that cannot be easily fixed with their next versions. Someday we will realize the only way to solve the limitations is to create a new world. But who knows? Maybe the need to use software to program will become no longer necessary, as we just talk to ChatGPT to get all work done.
Leave a Reply