Introduction
As large language models (LLMs) like GPT-3, PaLM, ChatGPT and others gain immense popularity, the need to thoroughly evaluate their capabilities has become crucial. These advanced AI models can understand and generate human-like text, making them powerful tools across various applications.
However, with great power comes great responsibility — we