Coding a Textual Corpus

Katja Thieme
6 min readDec 3, 2018

Guest post by F. Ramorasata, Katrina Matwichuk, Reya Rana, Sophia Boulbol-Baker

Online Data: Tweets

When coding a corpus of online sources, in this case a corpus of tweets, you should use “advanced searchfunctions to look for keywords, hashtags, and tweets from certain time periods. Here is a picture of what Twitter’s advanced search option looks like.

After you have asked Twitter to do this search for you, you will be shown a list of tweets aligning with your parameters. This makes it much easier to choose which tweets you can include in your corpus as a smaller, more specific number of relevant tweets are generated. Advanced search on Twitter also bolds the keywords you have included in your search parameters so they will be easier to find when scrolling through the list of tweets. Here is a picture for reference:

When it comes to containing and organizing all of your data, an easy way to do so is to take screenshots of the tweets you have decided to use and keep them in a folder on your desktop. That way, if you are including them in your paper you will have them compiled all in one place.

Because Twitter is such a large database, a great way to synthesize the streams of tweets is by looking at Twitter Moments. Twitter Moments are a compilation of tweets Twitter’s algorithms have generated in order to provide a snapshot of what topics are currently trending. You can use Twitter moments to focus the amount of data you have to look at, and this will make coding for key terms and facts to support your research a lot easier.

The Twitter moments page lays out trending topics in accordance to what is relevant to you locally, i.e. by your province or country of resident, or by what stories are playing out nationally. There is even an option to break down stories categorically by news, sports or entertainment.

If you have a Twitter account, you can also build your own Twitter moment and use this function to collect tweets for your corpus. Hover over your profile picture and find the “Moments” function in the drop down menu.

Most social media sites have search engines that allow users to code out information that pertains to their area of interest. For those seeking visual qualitative data, Instagram has an “explore” page which allows users to search for posts by tags according to how many posts contain a certain key word following a hashtag.

“The explore page allows users to not only search by hashtags, but also trending places.” (Trusted Reviews)

Using online sources cuts out a lot of the search workload that traditional pen and paper coding requires.

Paper Coding: News and Magazine Articles

If the main material of your corpus happens to be news and magazine articles the method I chose to use was fairly simple but effective. Most of what I suggest here may be very obvious but this is merely a reflection on what worked for me and helped to keep my work organized.

Firstly, when reading through the articles bookmark the ones you intend to use in your paper. Once you have collected your corpus, depending on the size of the corpus, set aside time to code them in groups.

The date your articles was published is usually pertinent to your analysis, so one way to visually organize that is to order the tabs from the oldest to newest article. That way you are effortlessly moving in chronological order as you code, without having to click each tab/article looking for which came first.

Assuming the breadth of your corpus requires multiple sessions in order to code thoroughly, make sure to keep a list of all the articles you have chosen. Your list can be a digital or hard copy; I chose to do mine on physical paper. You can easily mark off the ones you already did and get back right where you started without having to tax your memory.

Now, moving on to my actual process of coding. I decided to simply copy and paste specific paragraphs and/or phrases from the articles onto a document page, always including the title, subtitle, date of publication and author.

Some sites are not going to allow you to copy and paste, so in that case you can screenshot or manually type out the quotes pertinent to your research question. I suggest typing it out so you can take advantage of all the document app’s features, including its search functions. As you are “copy and pasting” these paragraphs, underline/highlight/embolden the key features of language that are pertinent to your research. Also, most — if not all — document applications allow for comments, so you can use this function to jot down any immediate thoughts that come to mind concerning said linguistic feature or phrase. Make sure, once again, that they your excerpts are chronologically ordered as this makes sorting through them much easier.

Another key feature with digital documents is the “find” function (command+f or control+f). This can be especially helpful for quantitative aspects of your research, e.g. when you want to efficiently count the instances when a term is used. The “find” function could also be helpful for qualitative research to document the collocation of certain words (i.e., what other words surround it). You can document your findings using the comment function again. Or simply open a new document where you can neatly summarize your findings.

There are many ways to code with pen and paper. I am a very visual person so I like to see the data come to life before me. The first step that I take is I gather a wide variety of coloured pens and highlighters and identify which colour is going to represent which type of data. For example if you are examining words that an author uses to refer to themselves vs the words that they use to refer to others, one colour could identify instances of self reference and another colour could identify instances within the writing of reference to others. Then I go through my corpus, focusing on one colour at a time, highlighting anything that I think could fit into the category of each colour. I then read through my corpus again, looking for any data that I may have missed on my first run through. Your paper may end up looking a bit messy, but that is ok.

After I have highlighted all of the information that I need, I put it into a data table so I can see how it all comes together, counting the occurrences of of each colour and marking down what language is used. Thus if my research is either qualitative or quantitative, I have the corpus data all ready to use. After I have finished gathering my data and putting it into a format that is readable I am able to move on to the analysis part of the process so that I can start to build my argument.

#PublicPedagogy #WRDS350 #CDNWRDS #WritingStudies #StudentResearch #AcWri

--

--