Human Language Assessment vs. Artificial Intelligence Assessments

As a company in an ever-changing marketplace, you’re always looking for ways to streamline recruitment. Language skills are important in many jobs, and it’s important to ensure that candidates have the right spoken and written language skills. Providing organizations and candidates with a balanced, fair, and comfortable assessment process has been the driving force behind our language assessment services since 2001. We collaborate with our talent acquisition partners to ensure a stress-free and efficient overall experience.

Let’s take a deep dive into AI assessments versus human-made assessments. It could be said that automated scoring technology for judging speech and writing has slowly gained global acceptance. Fewer people are wondering, “Does it work?” “Okay, how does it work for our purposes?” is the next question. This blog aims to provide readers with the information they need to answer both questions.

One common misconception about automated scoring is that a computer has been trained to “do what humans do.” Computers do not behave like humans because of automated scoring technology. Instead, it takes advantage of the fact that computers can be programmed to find and measure features of speaking and writing, combine and weigh them in a multidimensional space, and figure out which specific features and weightings best predict the score given by a human. There is no need to support the claim that a computer can “understand” a spoken utterance, which computers cannot do. There is no need to support the claim that a human judge can accurately count millisecond-level subphonemic timing events in natural speech (which computers can do better than humans). Both types of evaluations can consistently produce proficiency scores for spoken utterances.

The goal of this blog is to provide an understandable explanation of how current state-of-the-art assessments use automated scoring technology, as well as to review research that has shown that the technologies are effective in the assessment context rather than having a human do the assessments. Technology for automated scoring and validity when used correctly and with care, can make language tests much more interesting and useful in terms of delivery, task types, and scoring. At the same time, caution must be taken to justify each part of the technology used in the validity argument of the assessment. To ensure that score interpretation is consistent regardless of whether the test is computer-based or paper-based, the following questions may need to be investigated: Do students score the same on essay tasks when they handwrite or type them on computers? Is typing ability a disadvantage for some examinees? Do students use different strategic skills when reading passages on screen versus on paper?

One school of thought holds that because human judgments are fallible, subjectivity comes into play. In addition to comparing machine scores to human judgments, machine scores should also be compared to external criteria or measures of the same ability. Surprisingly, machine scores are more transparent than human judgments in some ways. Human raters evaluate language samples, consult scale descriptors, and use their judgment and experience to assign a final score. Still, there is no quantifiable way to measure how they weigh and combine the various pieces of information in an essay. In contrast, machine scoring can achieve something replicable. Every piece of data analyzed, as well as its precise weighting in the scoring model, is verifiable in the machine algorithms.

So, in automated models, it is possible to leave out irrelevant information (like the length of a sentence) and explicitly weight important information, like discourse markers, in a way that humans can’t do. Bernstein, Van Moere, and Cheng (2010) say that scoring models are based on data, can be checked, and can be shown to be wrong.

There are a lot of things that humans are good at and a lot of things they’re not so good at. Let’s recap the highlights of AI vs. Humans:

While human evaluators can be used for language ability screening, it’s not a good solution when you’re hiring many candidates at once.
When human evaluators conduct language interviews, their results are sometimes biased because of factors like fatigue, distraction, and hiring quotas.
Our customer data has shown that AI is much better at consistently and accurately evaluating candidates based on their academic language ability.
Human evaluators are better for assessing business language abilities needed for the workplace.

At ELAM, we continuously conduct benchmarking audits to ensure unbiased and cohesive assessments. We want to make sure all candidates have an equal opportunity when it comes to their language abilities!

Cookie	Duration	Description
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-functional	1 year	The cookie is set by the GDPR Cookie Consent plugin to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category .
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Others".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.

Cookie	Duration	Description
__zlcmid	1 year	__zlcmid is a cookie set by Zopim to help identify a user's chat session between page loads.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-33652074-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	No description
li_gc	2 years	No description

Human Language Assessment vs. Artificial Intelligence Assessments

Cookie Policy