Friday, 9 January 2015

2014 year in review... now with more hindsight!!!

After a pretty heavy year on the politics side some of us are still standing.

The labs are still here, (most) of the staff are still here and another cohort of students have launched successfully.  I'm calling it a win.... but at a cost.

The Uni has been through the wringer with the Federal Budget shenanigans.  Yet another round of funding cuts, impending funding cuts and systematic changes topped off with a whole lot of drawn out uncertainty.  This has resulted in a general grinding on the staff and a big overdraft on the goodwill that keeps the place running.  

Another round of accreditation has been navigated with high praise for the quality of the course and the staff.  The politics around the structure of the school aside, it was fairly painless.  However the tension between the idealism of the APAC rules and the reality of where higher education is going has continued.  This tension mostly plays out between APAC's idealistic staffing model and the reality of the downsize pressure that all the Uni's are under.

It's always interesting to spectate on these kinds of processes.  Watching the slow evolution of the professional organisations as they struggle not to modernise while the commercial reality of operating in an environment that does not in any way resemble the one from the 70's that fostered (and educated) the mindset that is creating the drag on any attempt to change.

Don't get me wrong, I appreciate some idealism in all things. But the reality is that a combination of factors and players are working at cross purposes to keep many of the professions (like psych) still functioning as though it was 1956. The problem is that mostly this results in the systems being completely non-viable within the current funding envelopes.  Genius! 

Before I start ranting too much.....

On the research side, there are lots of projects bubbling along.  All manner of bits and pieces need building.  I currently have about 14 projects on my to-do list but a few of them I would count as inactive at the moment pending further interest by the clients. I need to clear them before the honours students get too busy. 

We have a good active cohort of PhD students at the moment with a number in the end game. (This will clear a couple of desks ... I hope)  Not sure how many will be coming on board this year so it will be interesting to see how the desks play out. I can always stack a couple in a spare staff office I guess.

Speaking of staff, we lost a number of core staff over the last year through reorganisation and redundancies.  Organisational change is always difficult.  There is loss and sadness along with renewal and hope.... but that doesn't me we don't miss them.

The Uni is continuing its constant evolution (like any reactive organisation) and we are charging ahead at a considered pace.  Not fast enough for many but too fast for some.  There are new administration systems, more online courses and new campus locations.  I personally am not missing some of the old administration systems.  They were very aged in many of the design assumptions and were well past time for replacement.  But the trade off is the pain of learning the new systems and figuring out how to get stuff done with them.

We have lost a number of admin staff and the rest have been centralised.  This has created some renewal but also a concentration of pressure on the remaining staff.  They have also been loaded with the new software systems, more duties (from the redundant staff) and a constantly evolving policy landscape and organisation which is making it very difficult to form new knowledge networks.  Hopefully, we will have some time to stabilise and consolidate this year so that they can get over the learning curve of the new systems and find the productive sweet spots.

Just a few weeks now until the new honours cohort lands... time to shine the brass and grease the hinges....

Monday, 25 August 2014

Automating the export of edat2 files from E-DataAid


So, you have a data set collected in E-Prime and you want to export for processing in another tool suite?  Since it takes half a dozen mouse clicks per file, you can only image how un-fun this gets once the data set grows to dozens or hundreds of files.

There are three methods for exporting a data set of edat2 files that I know of:

1) Manually click your way through the whole data set and export each individually.
2) Use E-Merge to merge the all the files into one huge file... then export it in one step... then figure out how to process it (either split it back into individual participant chunks or do a population study on the whole lot)

3) Use the below script to get it all done automagically in one mouseclick.

 My problem is that each year I have dozens of E-Prime data sets to handle.  This has resulted in needing to process literally hundreds of edat2 files every year.  I usually teach the student how to do the manual export method but that was still prone to human error and often resulted in me having to go and find a file that had been miss-named or skipped in the process.  More time and effort spent.  This is a repetative problem with no variability... there had to be an automated solution.

So after appropriate googling around and some futile autohotkey hacking, I contacted PST help and found a helpful customer support person.  After explaining my need and establishing that the first two solutions above were not suitable for my problem, the support person produced some documentation for a simple command line interface to E-DataAid.  This provided me with a scriptable interface that I needed.  Add a little Perl hacking and tada.... solution.

Find below my solution and some notes.

The Solution for Exporting edat2 files to text files

For this solution I used perl as my scripting language of choice. I recomend the ActiveState ActivePerl. 

To use this file you will need perl and E-Prime 2 installed on the computer.  Then simply copy this script into a text file and name it something useful, like "DumpEdatToText.pl" and save it in the directory with the edat2 files.

The doubleclick the perl script to run it. It should export each edat2 file to a tab-delimited text file named with the participant id and session id. For example "p1s1.csv"  (Note: while this is not strictly speaking a CSV file, the file extension makes it easy to import into Excel for my purposes)

Following is my perl script:

#get all the edat2 files in the current directory
my @files = glob("*.edat2");

#process file each individually
foreach my $file(@files){

    print $file . " being processed\n";

    #get the participant ID and session ID from the file name
    my $p = "partID not found";
    #my $p = substr $file, index($file, "-"), 1;
    if ($file =~ /\-([^-]+)\-/)
    {
        $p = $1;
    }
   
    my $s = "SessionID not found";
    if ($file =~ /\-([^-]+)\./)
    {
        $s = $1;
    }
   
    my $outfileName = "p" . $p . "s" . $s . ".csv";

    #process the command file
    my $theCommandFileName = "cmdFile.txt";
    unless( open cmdFile, '>:crlf', $theCommandFileName){
        die "\nUnable to open $theCommandFileName\n";
    }

    print cmdFile ("Inheritance=true" . "\n");
    print cmdFile ("InFile=" . $file . "\n");
    print cmdFile ("OutFile=" . $outfileName . "\n");
    print cmdFile ("ColFlags=0" . "\n");
    print cmdFile ("ColNames=1" . "\n");
    print cmdFile ("Comments=0" . "\n");
    print cmdFile ("BegCommentLine=" . "\n");
    print cmdFile ("EndCommentLine=" . "\n");
    print cmdFile ("DataSeparator=    " . "\n"); #<tab character
    print cmdFile ("VarSeparator=    " . "\n"); #<tab character
    print cmdFile ("BegDataLine=" . "\n");
    print cmdFile ("EndDataLine=" . "\n");
    print cmdFile ("MissingData=" . "\n");
    print cmdFile ("Unicode=1" . "\n");

    close cmdFile;
   
    #pass the command file to E-DataAid for export
    system('"C:\Program Files (x86)\PST\E-Prime 2.0\Program\E-DataAid.exe" /e /f cmdFile.txt');
}


End Perl Script

Notes

Error Messages from E-DataAid
Error 1
No Output and no error message generated when running the following command
"C:\Program Files (x86)\PST\E-Prime 2.0\Program\E-DataAid.exe" /e  exampleB.txt

Solution – forgot to include the /f flag on command line. Now works.
"C:\Program Files (x86)\PST\E-Prime 2.0\Program\E-DataAid.exe" /e /f exampleB.txt
The missing flag should generate an error message.

Error 2
Message: “Error Reading Unicode from command file”
Solution – Added “Unicode=1” to end of command file.  Guessed that it’s a flag field. Seems to work.

Error 3
Message: “Error Reading E-Prime data file: D:\TestDumper\test.edat2 file” 
Solution – Had the full path to the file in the command file. Replaced with a relative path and the command worked. 
InFile=D:\TestDumper\test.edat2  <-Failed
InFile=nBackVerbal-1-1.edat2 <- Worked

Error 4
Leaving the “InFile=” field empty will crash E-DataAid
E-DataAid should correctly handle and report the missing field.

Error 5
Leaving the “OutFile=” field empty will generate a spurious error message.  “Error exporting to text file: .” 
E-DataAid  should detect the missing field and correctly report that there is no output file name supplied.

Error 6
Absolute file path in the “InFile=” field to edat2 file cause failure.
This may be due to assuming that the file is in the current working directory????
Solution- Use a relative path or work in the same directory.

Wednesday, 30 July 2014

Research Instrument Design and Anonymity

Here's a scenario...

Your research design means you are collecting data using an online survey instrument.  You have ethical clearance to collect data from your participant population anonymously. You then embed a question in the survey asking the participants to provide their email address if they would like a summary of the results of the research; as per your ethical obligation.

Later....

When looking at the raw data from your instrument, you see a row of the responses provided by the participant... next to their email address. 

Now tell me how this is anonymous... and show your working!

Why is this a problem?

1) You know the population that was invited to participate.
2) You know some or all of the email addresses of those participants. By elimination you may be able to guess more identities. With additional demographic information you can further refine your guesses.
3) You now need to store and secure the data from your instrument at a much higher level of security.
4) You are now required to store ALL this data for seven years in such a way that it can be reivewed by third parties (who you do not know) at any time in the future.
5) This data exists on multiple computer systems already.
6) This data may leak in a number of obvious and in-obvious ways.
7) You are ledgislated to handle and secure this data as of 12/3/2014.
8) It's really hard to be sure that data is actually deleted. Sooner or later, a search engine will find it.

How is this ethical research?  How are you in compliance with the NHMRC guidelines?  How are you in compliance with the Australian Privacy Legislation? How are you in compliance with the University Policies? 

The government has passed some updates to the privacy laws and now they have real and specific applicability to this data.

Penalties for non-compliance

 You are individually liable for penalties if you do not comply (up to $220,000 for individuals, $1.7 Million for organisations... I.e your employer, the University)  See the Privacy Legislation below.

What is Personal Information?

 Personal information has the meaning as set out in s 6 of the Privacy Act:
information or an opinion (including information or an opinion forming part of a database), whether true or not, and whether recorded in a material form or not, about an individual whose identity is apparent, or can reasonably be ascertained, from the information or opinion.
Sensitive information is a subset of personal information. The Privacy Act defines sensitive information as:
  1. information or an opinion about an individual’s:
    1. racial or ethnic origin; or
    2. political opinions; or
    3. membership of a political association; or
    4. religious beliefs or affiliations; or
    5. philosophical beliefs; or
    6. membership of a professional or trade association; or
    7. membership of a trade union; or
    8. sexual preferences or practices; or
    9. criminal record;
    that is also personal information; or
  2. health information about an individual; or
  3. genetic information about an individual that is not otherwise health information.

What are your obligations once you collect Personal Information in a research data set?

  • Securely storing the data (Copying, publishing, data requests, integrity investigations)
  • Securely destroy the data
  • Authentication of access to the data
  • Manditory reporting of data breaches
  • Transfer of control of the data
  • Hosting the data on foreign servers
  • Implications of the Freedom of Information Act
The list of obligations and the cost for compliance is quite substantial..... are you sure you want to collect this information for your research project?

Academic Email is subject to the Freedom of Information Act under Australian Law.  This means that if your data set has been emailed (say between the student and their supervisor) then it could be leaked via that mechanims.

What can Students and Supervisors do?

Avoid this whole mess by not collecting personally identifying information (Email Addresses specifically) as part of the data set. Design around this potential risk.

Do not sample very small, specific, known populations. 

Beware when the Ethics review hands back a requirement for this kind of mechanism in your study.  Be prepared to push back with some alternate design strategies to avoid this problem.

Be aware of the legislation and the implications of compliance.  

Design research to seperate identity and data, if the participants are known.  Do not embed their identities in the data set or research materials (which must then be stored and shared)

Alternate Design Strategies

Case 1 - Ethical Requirement for optional feedback of research results to research participants.

The recomended (by me) strategy is to provide the particiants with a contact email address (researcher or supervisor) from whom they can request a copy of the results of the research.
This strategy avoids the issue of collecting and holding a list of email addresses with their associated cost and the risk of violating the anonymity of the participants. 

Case 2 - Repeated measures design requiring followup contact with participants.

Request participants to contact the researcher and be added to a pool prior to the data collection starting.  Then the researcher can broadcast to this list an anonymous link to the data collection instrument at each measure time.

This provides a dis-connect between the participants activity and their identity.  Unless the researcher has a very small pool or makes other attempts to link the particiant and their data... there is no way to identify who has provided which data record.

This then allows the list of email addresses to be stored seperatly and destroyed securely independ of the data set that results from the research. 


Further Reading

10 Steps to Protect Other Peoples Personal Information
http://www.oaic.gov.au/privacy/privacy-resources/privacy-fact-sheets/other/privacy-fact-sheet-7-ten-steps-to-protect-other-people-s-personal-information

How to de-identify data
http://www.oaic.gov.au/privacy/privacy-resources/privacy-business-resources/privacy-business-resource-4-de-identification-of-data-and-information

The 17  Australian Privacy Principles (APPs)

General information on Information security
http://www.oaic.gov.au/privacy/privacy-resources/privacy-guides/guide-to-information-security



Thursday, 5 June 2014

Random Stimuli Sequences

When is a sequence of stimuli "Random" enough?

Students often have a "pre-conceived" idea of what they think "Random" looks like.  Which if you think about it for a second is interesting in itself.  (Go look up Apophenia)

The correct answer is: When then sequence appears to have no pattern perceivable by the participant!

Note that this is not the same as "No pattern perceivable to the researcher".  The researcher is often conditioned to the sequence simply by their understanding of the research design and having tested the experiment a few (or hundreds) of times.  Their brain is already trying to detect patterns in the sequence.  This is what brains do.  

This is always in contrast to a naive participant who will only experience the sequence once.  (If you design calls for repeated measures... different design)


What are the options?


Random Sequence With replacement


Imagine a bag that contains all the possible sequence items,  reach in and take an item without looking,  record the item and then return it to the bag.  Repeat as needed. 

This method uses a sampling mechanism of reaching blindly into a bag.  This means that each sample is independant.

For example, if we have  possible items in the bag (5,9,4,2,7), then a sequence of five samples could be any of the following:

5,9,4,2,7

5,5,5,5,5

7,7,7,7,2

5,9,5,9,5

While these sequences that appear to be patterns are "legal" and can quite possibly be generated using this mechanism, there are  5 * 5 * 5 * 5 * 5 (3125) possible different sequences that could be generated. 

Much like a coin-walk,  the distribution of the items in this random system will approch being perfectly even.... as the sample size grows... but it will not be anything like even at small sample sizes. Keep in mind that for any participant, who expereinces the sequence once, this is a sample size of 1... I.e lots of noise in the distribution. . Due to this fact, these sequences that appear to have a pattern are generally not what the researchers "want" to see.  And while this is still a very effective mechanism for generating the stimuli sequence, the possiblity of there appearing to be a pattern can mess with the researchers head. 

Random Sequence Without Replacement

Imagine a bag that contains all the possible sequence items,  reach in and take an item without looking, record that item but do not return it to the bag.  Now pick the next item from those remaining in the bag.  Obviously the bag will eventually be exhausted.  (At which point your sequence may be finished, or you may return all items to the bag and start again)

For instance, again with our items of ( 5,9,4,2,7), some sequences could be

7,5,9,2,4

2,4,5,7,9

There are 5 * 4 * 3 * 2 * 1 (120) possible sequences if we are creating five item sequences.

If we are creating say a ten item sequence, then we would run this method twice. 

i.e 2,4,5,7,9,7,5,9,2,4

Which would give us a possible  120 * 120 ( 14400 ) possible sequences.

This mechanism will generate sequences that do not have the same possibility of repetition of items as the first mechanism. 

The distribution of items in this method is much closer to even at small sample sizes and so creates a safe feeling for the researchers.  However, keep in mind that where the number of items in the bag is small, this can create a repeating sequence that the participants can still detect.  (7 plus or minus 2 being a useful rule of thumb for the number of items a person can remember )

Generally, if the number of sequence items is less than 9, the participant can start to anticipate by elimination what item may come next.  This is unavoidable, as its part of the "normal" function of the brain. The longer the sequence runs... the better they will feel at this.  (This does not mean they will be right... but over time, it can be better than chance for people who are good at this task)


So what is the best mechanism to generate your stimuli sequence?

Well this gets tricky because often stimuli sequences involve repetition of some items, some contains distractor items with their own frequency, others may include cueing rules and other rules for follow order.  Some sequences include "idiot check" items, some are trying to cause patterns and anticipation, while others are trying to control for these effects.... some have blocks of similar stimuli, some have intruders, control blocks and neutral effect stimuli.  There are as many permutations as there are researchers and research design.

Find some experts and talk it over.  There is a sequence generator for you out there.

The biggest problem however is when the right generator has been build and verified and the researcher says "I don't think its random enough...."


Wednesday, 4 June 2014

SCU Interlectual Property Rights in Research Projects

http://policies.scu.edu.au/view.current.php?id=00017

Optical Illustion Resources

http://news.distractify.com/culture/mind-blowing-optical-illusions/?v=1

Copyright, Video and Model Releases for Research Purposes


The current copyright legal framework.   The general rule is you need to obtain permission to use from the rights holder.  This may include payment for the license to use.  There are some exemptions under the copyright act for education and research purposes. 

COPYRIGHT ACT 1968 - SECT 40 

Fair dealing for purpose of research or study

 There are various issues that have arisen since the copyright act was written (digital technology) that are more complex to argue.  There are proposals to change the copyright system, but it has not yet reached the legislation stage.  

ALRC Proposals to change Copyright - Educational Use section


Copyright and the Digital Economy (DP 79) 5 June 2013

Fair Use and various examples...



Taking Photographs/Video for use in research projects need an SCU model release form signed by the model. This needs to be archived with your project records.

Talent Release /Permission to Use Form