A1BG gene transcription programming
Editor-In-Chief: Henry A. Hoff
Computer programs to search a nucleotide sequence along a DNA strand can be written in many languages. Here is an example written in BASIC:
<source lang="cpp">
- load "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/SuccessablesAGC--.bas"
- This program tests the discovery of AGC (AGCs).
1 dim indicator$(100),molecule$(100)
- This version of Successables.bas starts with the default genome option.
2 input "Start Successables execution from beginning (b), last stop (L), water exchange (w), stop (s)? >> ";decision1$
- For the negative strand (ZSCAN22 to A1BG, in the negative direction), use "--nt.bas".
8 if decision1$ = "b" then goto 10 9 if decision1$ = "s" then goto 100000 10 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/SuccessableIsoforms.bas" for input as #1 11 dim sgeneisoformid$(40000),bb$(40000) 12 input #1,successables 13 for i = 1 to successables 14 input #1,sgeneisoformid$(i) 15 next i 16 close #1 17 print "Successables array has been inputted."
- This subroutine goes one by one through each successable isoform.
- Do not use the index i for anything else.
20 for i = 1 to successables 21 file$ = sgeneisoformid$(i)+"--nt.bas" 22 file2$ = "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Nucleotide Sequences/"+file$ 23 open file2$ for input as #1 24 dim numbertypes,nts$(40000) 25 input #1,numbertypes 26 for j = 1 to numbertypes 27 input #1,nts$(j) 28 next j 29 close #1 : poll = 0 : goto 30
- This is the 26 subroutine.
- AGC boxes (AGC)s 3'-AGCCGCC-5'
30 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/AGC--.bas" for input as #1 : print "Working on the AGC box!" 31 input #1,agcboxnumber 32 close #1 33 if agcboxnumber = 0 then poll = 1 : goto 280
- This program tests the discovery of AGC boxes.
- The 2 subroutine.
- Check to see if AGC already in file.
- This 2 subroutine loads in gene isoforms and tests AGC--.bas.
40 dim geneisoform$(40000), agcbox$(40000) 41 dim indexjagcbox(40000) 42 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/AGC--.bas" for input as #1 43 input #1,agcboxnumber 44 for m=1 to agcboxnumber 45 input #1,geneisoform$(m) 46 input #1,agcbox$(m) 47 input #1,indexjagcbox(m) 50 next m 51 close #1 : if poll = 1 then goto 54 54 for m=1 to agcboxnumber 55 if geneisoform$(m) = sgeneisoformid$(i) then poll = 2 : goto 100000 56 next m
- If the geneisoform is not in AGC--.bas then send program to find AGCs.
57 poll = 1 : goto 280
- This is the 280 subroutine.
- Find all possible AGC: 3'-0A-1G-2C-3C-4G-5C-6C-5'.
- Once found repeat entry must be prevented!
- Any time n > 0, restartagcboxj should be value so that j=restartagcboxj + 1 is the correct restarting value.
280 n=0 : box$="3'-" : indexagcboxj=1 : j=0 : restartagcboxj = 0
- Send computer to see if AGC already found. This is 281 to 306.
281 goto 307
- Recover indexagcboxj. Limit on n is 7.
282 for j = indexagcboxj to numbertypes 283 if n = 1 then goto 291 284 if n = 2 OR n = 3 OR n = 5 OR n = 6 then goto 294 285 if n = 4 then goto 297 288 if n > 0 then goto 295 289 if nts$(j) = "A" then goto 301 290 goto 296 291 if nts$(j) = "G" then goto 301 292 if n = 1 then j = j - 1 293 goto 296 294 if nts$(j) = "C" then goto 301 295 j = restartagcboxj - 1 296 n=0 : box$="3'-" : goto 304 297 if nts$(j) = "G" then goto 301 298 goto 295 301 n=n+1 : box$=box$ + nts$(j) 302 if n = 2 then restartagcboxj = j
- When an AGC has been found, first store the isoform and the AGC.
- Then send the computer to 100000.
303 if n = 7 then goto 306
- For ZSCAN22 to A1BG use limit of 4560, but for ZNF497 to A1BG use 958.
304 if j = 4560 OR j > 4560 then goto 100000 305 next j
- Store isoform and its AGC.
306 box$=box$ + "-5'" : indexagcboxj = j 307 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/AGC--.bas" for input as #1 308 input #1,agcboxnumber 309 close #1
- Check to see if AGC element already in file.
310 if agcboxnumber = 0 AND n = 0 then goto 282 311 goto 332 312 agcboxnumber = agcboxnumber + 1 313 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/AGC--.bas" for output as #2 314 print #2,agcboxnumber 315 print #2,sgeneisoformid$(i) 316 print #2,box$ 317 print #2,indexagcboxj 318 close #2 319 goto 295
- Direct computer to DCE.
320 goto 100000
- Check to see if AGC already in file.
327 if indexj < 4560 then goto 332 328 for m=1 to agcboxnumber 329 if geneisoform$(m) = sgeneisoformid$(i) then goto 100000 331 next m 332 agcboxnumber = agcboxnumber + 1 333 dim geneisoform$(40000), agcbox$(40000) 334 dim indexjagcbox(40000) 335 geneisoform$(agcboxnumber) = sgeneisoformid$(i) 336 agcbox$(agcboxnumber) = box$ 337 indexjagcbox(agcboxnumber) = indexagcboxj 338 open "/ChipmunkBasic/Chipmunk_Basic_3.5.7/GeneProject/Gene Successions/AGC--.bas" for output as #2 339 print #2,agcboxnumber 340 for m=1 to agcboxnumber 341 print #2,geneisoform$(m) 342 print #2,agcbox$(m) 343 print #2,indexjagcbox(m) 344 next m 345 close #2 346 goto 295 347 next i 100000 end </source>
In the file that's loaded into the interpreter, "#", without the quotes are non-executed comments for the programmer.
To "load" the program, copy only the "load ..." portion in front of the cursor (>), without the ()s.
Type: run, then answer the questions.