Not even close. The paper is questioning LLMs ability to reason. The article talks about fundamental flaws of LLMs and how we might need different approaches to achieve reasoning. The benchmark is only used to prove the point. It is definitely not the headline.
Can you be more specific? What are you running on the command line and what is the result? Normally, you would install Chrome via a package manager, and that package manager would be responsible for updating it.