A Modular VME Or IBM PC Based Data Acquisition System For Multi-Modality PET/CT Scanners Of Different Sizes And Detector Types
D Crosetto. A Modular VME Or IBM PC Based Data Acquisition System For Multi-Modality PET/CT Scanners Of Different Sizes And Detector Types. The Internet Journal of Medical Technology. 2002 Volume 1 Number 1.
A modular, digital system, fully programmable and scalable for a multi-modality, open (to accommodate claustrophobic or overweight patients, with the option of closing the detector, to increase efficiency), 3-D Complete Body Scan (3D-CBS) utilizing both Positron Emission Tomography (PET) and Computed Tomography (CT) in one unit with no moving parts, has been designed for VME and IBM-PC based platforms. This device fully exploits the double photon emission and allows for: annual whole-body screening for cancer and other systemic anomalies; only 1/30 the radiation dosage; a reduction in scan time to 4 minutes for a Field of View (FOV) of 137.4 cm as opposed to 55 minutes for a FOV of 70 cm; a decrease in examination cost by 90%; an increase in sensitivity, providing physicians with additional clinical information on a specific organ or area and contribute to the specificity in detecting and assessing cancer. These advantages allow for early detection --the best way to defeat cancer. The system collects digital data from multiple electronic channels. Each electronic channel carries the information (64-bit) of all sensors included in a given view angle of the detector. The 64-bits data packets acquired at 20 MHz by each channel with zero dead-time are correlated with neighboring information and processed in real time by a 3D-Flow DSP to improve the signal-to-noise ratio and extract and measure particle properties, resulting in the identification of the particles' position, accurate energy measurement, Depth of Interaction (DOI), and the timing measurements. A thorough real-time algorithm that best identifies the photons can be executed because the 3D-Flow sequentially-implemented, parallel architecture allows (SIPA) for processing time to be extended in a pipeline stage beyond the time interval between two consecutive input data. Very low power consumption drivers drive short, equal-length PCB traces between 3D-Flow chips, solving the problem of signal skew, ground bounce, cross-talk and noise. The electronics validates and separates events from the different modalities (PET/CT); PET events are checked for coincidences using a circuit sensitive to radiation activity rather than the number of detector elements. Both PET and CT examinations occur at the same time in a stationary bed position using a detector with a long FOV, avoiding motion artifacts, increasing throughput, reducing examination cost, reducing radiation to patients, increasing resolution, improving data quality, and reducing erroneous readings (false positives). The saturation of the electronics of current PET is eliminated by using a system with an input bandwidth of 35 billion events per second distributed over 1,792 channels. The output bandwidth is selectable to sustain the activity generated by the maximum radiation that a PET/CT should ever receive. The overall events are gathered by IBM PC or VME CPU board, formatted and sent to the image processing workstation. The entire system can be simulated from top level to silicon gate level before construction.
Positron Emission Tomography (PET) medical imaging shows functional imaging at the molecular level as opposed to the anatomical imaging of other devices (e.g., x-ray computed tomograph or CT). It can provide an
Radioactive fluorine isotopes are attached to glucose (sugar) to make a compound called fluorodeoxiglucose (FDG). FDG is injected into the patient and is absorbed in the same manner as normal glucose.
The human body creates usable energy because it turns glucose (sugar) into energy. Different cells metabolize glucose at different rates. The biochemical processes of the body's tissues are altered in virtually all diseases, and PET detects these changes by identifying areas of abnormal metabolism as indicated by high photon emission. Cancer cells, for instance, typically have much higher metabolic rates because they are growing much faster than normal cells and thus absorb more FDG (60 to 70 times more) than normal cells and emit more positrons. Detecting these changes in metabolic rates with the PET enables physicians to find diseases at their very early stages, since in many diseases the metabolism of the cells changes before the cells are physically altered. These physical alterations can be revealed by CT, Magnetic Resonance Imaging (MRI) or x-rays.
It is possible to reveal molecular pathways of the FDG (fluorine + sugar) because the radioactive fluorine isotope emits a positron (or anti-electron), that travels a few millimeters until it meets a free electron, at which time a mutual annihilation takes place. The masses of the positron and electron are converted to electromagnetic radiation: two gamma rays (photons). The total energy of these two photons is equal to the mass of the original electron and positron (511 keV), and they are emitted in diametrically opposed directions.
A PET device is a set of detectors coupled to sensors that surround the human body and send the signals generated by the incident photons into the detector to be identified by special electronics. (See Figure 1). The photons are emitted (emission mode) inside the patient's body at a rate up to hundreds of millions per second. When the 511 keV gamma ray pair is simultaneously recorded by opposing detectors, an annihilation event is known to have taken place on a line connecting the two detectors. This line is called the “Line of Response” (LOR).
The signals received from the detector are processed by the electronics, and the extracted information of the energy, photon's arrival time and spatial location of the hit are sent to the workstation which reconstructs the image of the emitting radioactive source. The intensity of each picture unit (pixel) is proportional to the isotope concentration at that position in the human body (which, of course, corresponds to and is caused by higher metabolism of sugar in that area). Thus, PET exploits differences in rates of absorption of FDG (normal or abnormal) in cells in different parts of the body to provide information useful in recognizing the presence of a disease.
Only a good quality image from a PET with high sensitivity and without motion artifacts, interpreted by an experienced radiologist, can indicate whether and to what extent certain diseases are present in the body.
In addition to the FDG radioisotope used in neurology, cardiology and oncology, other common radioisotopes used are 13 N-ammonia and 82 Rb in cardiology, 15 O-water in neurology and psychiatry, and 11 C-methionine in oncology. (See Section XIII for production cost of the radioisotopes and their half-lives).
A PET examination can detect cancer and indicate if a primary cancer has metastasized to other parts of the body. It replaces multiple medical testing procedures with a single examination, and in many cases, it diagnoses diseases before they show up in other tests or with other imaging devices.
The Computed Tomograph (CT) measures the density of the tissue by sending lower energy x-rays (60 to 120 keV) through the patient's body (in transmission mode) and computing their attenuation on the other side.
Combining different technologies in one device further assists physicians in clinical examinations. Viewing PET functional imaging data in conjunction with CT morphologic cross-sectional data is sometimes mandatory if lesions are found. It is possible to combine PET and CT technology in a single 3D-CBS multimodal device. Because both the CT and PET have several parts in common (i.e., detectors, mechanics, electronics), the combined machine's increase in electronics is not significant [ ].
Changing The Role Of PET In Health Care With The 3D-CBS
Before the present design, as set forth in detail in , was devised, it was not economically advantageous to construct PET devices with increased axial FOVs, because the benefits of capturing more photons and decreasing the examination time were not thought to offset the significant increases in the costs associated with PETs with a longer FOV. In addition, the required radiation to the patient was too high to be repeated yearly.
However, with the new design, which entails a longer FOV and a radiation dose reduced to 1/30 of that required by the current PET, examination costs are substantially lowered and the patient's radiation exposure is well within the safety limits for annual use. Because 40 to 60 patients can be examined each day (in 10 to 12 hours) instead of the current 6 to 7, and because the cost of the radioisotope needed for each examination is also decreased, the cost increase of the new device with longer FOV (by a factor of 2 or 3 times that of the current PET) can be justified and amortized in a shorter time. (See Section XIII).
For the above reasons, the role of the PET can for the first time be changed from that of a treatment aid to that of a tool for annual preventive scanning for cancer and other systemic anomalies in asymptomatic people. It can also improve the grading, staging and follow-ups of discovered cancers by allowing more frequent examinations and providing additional clinical information to the physician during cancer treatment to best evaluate the effects of pharmaceutical treatments in a shorter time.
How does PET compare with other non-invasive imaging technologies?
In order to appreciate the great potential of PET, it is necessary to briefly discuss other imaging devices. PET has a unique ability to assess the functional and biochemical processes of the body's tissues in a manner far superior to any other non-invasive techniques such as CT, MRI, x-ray, or Single-Photon Emission Computerized Tomography (SPECT). The superior ability lies in revealing the molecular pathways of FDG, or other naturally existing compounds in the body.
Attempts to obtain functional imaging with CT or with MRI technology associated with contrast agents delivered to the patient are much more difficult, involve more discomfort and risk to the patient, and do not obtain the results that PET technology can provide. For example, functional MRI can process only one or a few slices of images at a time because the volume of data to be handled is too large, and it is difficult to understand what biochemical parameters are contributing to a specific electrical signal generated by the MRI device. Neither problem exists for PET because data can be acquired for a whole body scan in few minutes and not just for a few slices, and the two photons emitted in diametrically opposed directions provide unambiguous information of the path of the tracer (fluorine + sugar).
After PET, the technology most able to assess functional and biochemical processes is SPECT technology emitting a single photon; however, SPECT does not provide the simple opportunity to find the location of the emitting source of the two photons emitted in opposite directions that PET does. In fact SPECT technology, because it emits a single photon, makes use of collimators placed in front of the detector to detect the direction of the incoming photon that will be used for determining where it originated. A way to implement a collimator is to have multiple parallel holes (or holes with an angle) in lead material where the photons travelling with the desired acceptance angle pass through the holes to interact with the detector. A considerable number of photons escape the patient's body, but because they do not have the same direction as the holes in the lead, they are lost.
Why PET has not been widely used in the past 25 years in spite of the excellent, fast detectors available for 10 years
The advent of PET in the last 25 years has not had a striking impact in hospital practice and has not been widely used because the electronics with the capability of fully exploiting the superiority of the PET technique has never been designed.
The great potential of PET is exploited only if it does not require the use of a septa rings between detector rings, and if it has an efficient electronics that does not saturate and that fully extracts particle properties using a through real-time algorithm.
Conversely, the advances in detector technology have been superb, providing for more than 10 years fast crystals (e.g., LSO with a decay time of the order of 40 ns) and the construction of detectors with small crystals that help to limit to a small area of the detector the dead time of a crystal that received a photon.
Features of the PET and CT devices and the design blueprints of the electronics exploiting them
One of the essential features of PET and CT is that they generate a high rate of events (hundreds of million per second). Each event in PET consists of two photons of 511 keV emitted in diametrically opposed directions, and in CT of a single photon of the energy set by the operator (60-120 keV), attenuated by the tissue encountered during transmission through the patient's body. In order to capture most of the “good” events, one must be able to measure the photon's properties (energy, timing, and location) and to perform pattern recognition operations on groups of signals very quickly. The electronics should be capable of acquiring and processing millions of frames per second, where each frame consists of data relating to the properties of several photons. This can be compared to a camera taking millions of pictures per second and recognizing the objects in each picture. (See Sect. V-A).
This article details blueprints for significant improvements of the electronics and of the overall design by providing a solution that combines a striking increase in the efficiency of PET and CT in capturing more “good [] “ photons with the complete elimination of motion artifacts with the implementation shown in Figure 1 and Figure 3. This allows one to visualize minimum anomalies when cancer is still small and emitting a few photons compared to other areas such as the brain, the urinary track, and the heart, where tracer concentration is high even in the absence of cancer. Digital subtraction of pixels will visualize and magnify the minimum differences of glucose metabolism not only in tissues with high but also the ones with lower metabolism activity.
These improvements to the electronics will advance the PET technology to be a most beneficial, revolutionary diagnostic tool, which is at the same time cost-effective compared to other imaging modalities. Currently the best PET detect about 2 photons out of 10,000 (see references  [ ], and [ ]). If used in 2-D mode as described in the next section, they can detect about 2 out of 100,000, while the SPECT devices can detect only about 1 out of 200,000 photons (for one head SPECT; and about 1 out of 100,000 for two heads SPECT) emitted by the source. The aim of this 3D-CBS design is to detect about 1 out of 10 photons emitted by the source.
Increasing the efficiency (calculated as the ratio between the number of photon pairs in time coincidence detected divided by the amount of photon pairs emitted by the tracer during the scanning period) yields higher-quality images and it allows a decreased radiation dose to be delivered to the patient. The recommended limits to exposure of (whole body) radiation at CERN and in the U.K. is 1.5 rem per year; in the U.S. it is 5 rem per year. The average background radiation received by a person in the U.S. is 0.36 rem per year, while today's typical PET examination with 10 mCi of FDG delivers 1.1 rem. The improvements in the electronics set forth in this article would require only 0.33 mCi of FDG for a PET examination, which delivers only 0.036 rem to the patient.
Current PETs with short FOVs will also be improved using the design set forth in this article (see also Section 16 of ), by providing physicians with additional clinical information on a specific organ or area and by contributing to specificity in detecting and assessing cancer. In addition, non-saturating electronics with zero dead time will allow PET manufacturers to increase the FOV and significantly improve the PET's photon capturing efficiency to 10%, as shown in Figure 2b.
Measurements showing that the electronics is the factor limiting efficiency in current PET and those under design
That the electronics is the limiting factor of the efficiency of current PET (besides the plots of PET working in 3-D as described later) is shown by the fact that some PETs currently used in hospitals operate in what is called 2-D mode. 2-D refers to the use of septa rings placed between detector rings. This is used to limit the number of photons hitting the detector (in particular for body scan where Compton scattering is more numerous than in a smaller volume head-scan) because the electronics cannot handle the unregulated rate of photons hitting the detector. The real-time algorithm of current PET cannot thoroughly process all the information necessary to separate a good event from bad events.
The saturation of the electronics of current PET, even during levels of low radiation activity; is confirmed in the measurements of the sensitivity reported in the articles of the past 25 years and is graphically represented in a form similar to Figure 2a.
The limitation caused by the saturation of the CTI/Siemens electronics [ ] (at 10 Mcps), is shown in Figure 3 of [ ]. This is a simulation made by Moses and Huber (see reference ) of a PET camera that completely encloses a small animal in a volume formed by 6 planar banks of detector modules. The caption of Figure 3 of reference  says: “The random fraction is small due to the absence of “out of field” activity implicit with complete solid angle coverage, as well as a short coincidence windows. The total scatter event rate is 11% of the total true event rate. A maximum system count rate of 10 Mcps is assumed.” The plots shown in Figure 3 of  are compared with the measurements of the sensitivity of the existing MicroPET with short FOV and thin (10 mm) crystals of the CTI/Siemens [ ]. The latter also reveal saturation of the electronics in Figure 2a of .
Figure 2. (a) Typical sensitivity plots of current PETs, which are shown in articles of the past 25 years. The saturation of the electronics limits the capturing of the true events as the radiation activity increases. The randoms increase due to poor timing resolution. The “true + scatter” curve is not to be confused with the crystal's dead time because these days the crystals are cut in 2 mm x 2 mm, or 4 mm x 4 mm, and the dead time is confined to a small area of a few crystals out of the entire detector. A PET with non-saturating electronics should show a measurement of the type of “true + scatter (extrapolation).” Section (b) shows the change in PET efficiency with the improvements described in detail in . The efficiency is increased from about 2 photons detected out of 10,000 to about 1 out of 10. (The 10% estimated efficiency could vary as shown in the top section of Figure 2b, depending on the patient's weight, the FOV, and whether fast, expensive crystals or slow, economical crystals are used).
The coincidences are two photons simultaneously detected by the detectors. There are three types of coincidence: “true,” “scatter” and “random.” The sum of them is also called “prompt.” The “photons captured” mentioned in Figure 1 are “prompts.” The “true” are the image forming events; the “scatter” are non-image forming events that Compton scattered into the patient's body and have lost the direction information; and the “random” events are two photons emitted within the required time differences but belonging to two different positron-electron annihilations. The efficiency defined in the previous subsection is the one used by manufacturers and designers in the performance measurements. For uniformity in performance comparison, the same method is used in this article, however, see Section 13.4.7 of  for further separation of true, scatter and randoms.
The CT Section Of The 3D-CBS Multimodal Imaging Device
Several types of CT scanners can be integrated into the 3D-CBS scanner. This article describes the integration of the fastest CT scanner (often referred to as a fifth-generation CT system) with a design to enhance its features by eliminating the patient's bed motion. The principle of operation of the electron-beam fast CT scanner was first described in [ ]. Later, in 1983, Imatron Corporation developed the scanner and commercialized it. It is now a proven technology (see also [ , , , ]).
Current designs of the Electron Beam Computed Tomograph scanner (EBT) consist of an electron gun that generates a 130 keV electron beam. The beam is accelerated, focused, and deflected by the electromagnetic coils to hit one of the four stationary tungsten target rings, which emit x-ray photons. The x-ray beam is shaped by collimators into a fan beam that passes through the patient's body to strike a curved stationary array of detectors located opposite the target tungsten rings. A few rings of detectors covering an arc of about 210 , made of crystals coupled to sensors which convert light into current, detect the signal, of the incident photons and send them to the data acquisition system. The patient's bed moves through the x-ray fan beam for a whole-body scan.
The proposed design of Figure 3 eliminates the patient's bed movement by increasing the number of tungsten target rings above and below the patient. One electron beam (or two, one sweeping the lower half of the detector and one sweeping the upper half) is accelerated, focused, and deflected by the electromagnetic coils at a desired angle to strike one of the tungsten rings. The collision of the electron beam with the target tungsten ring generates the x-ray fan beam (shaped by collimators), which passes through the patient's body to strike the opposite detectors (lower or upper half). One or two electron beams, sweeping at different deflections and hitting different target tungsten rings, will scan the patient's entire body in the FOV, with high resolution. The patient's body is surrounded by crystal detectors with apertures for the x-ray beam going from the tungsten rings to the detectors beyond the patient's body and having only the patient's body as an obstacle as shown in Figure 3 (The PMT and crystals close to the apertures are shielded from receiving the x-ray fan beam from the back of the detector). The same crystal detectors (see Section VII) used for detecting PET emission photons, are also detecting the CT transmission photons.
The attenuated x-rays detected by the CT, besides being used to display the anatomy of the body, will also serve as very accurate information for determining the attenuation correction coefficients for PET scanning. (See the last paragraphs of Section XII).
The geometry of the CT of Figure 3 lends itself to multi-slice acquisition to an even greater extent than the 16-slice-scanner presently under design by some manufacturers because it has several rings of detectors covering over one meter of FOV.
When specific studies for high resolution using the sole CT are needed, the technique of using one, four, or more positions of the patient's bed (not to exceed the distance between two detector' x-ray beam apertures) will increase the resolution. If two scans are performed at half (or ¼) the distance between two detector' x-ray beam aperture, each section of the patient's body will receive the x-rays from both sides at different angles without the need to incline the bed position as it is done in current CT (see Figure 3).
Gated techniques (a technique in which the heartbeat is synchronized with the scan views) or other techniques currently used with EBT can be easily implemented with this new design because they are facilitated by the stationary position of the patient.
Eliminating Motion Artifacts
The difference between the PET/CT devices introduced recently in the market and the ones currently under design as compared to the device described in this article, is that the latter completely eliminates the motion artifacts of the sliding bed and uses the same detector to detect both CT and PET photons. The complete elimination of the artifact is possible because the scan is done in a single bed position by the two machines integrated in a single unit with a long field of view.
The EBT with extended FOV incorporated into the 3D-CBS provides additional advantages compared to the conventional CT. With the EBT, each organ is scanned in a fraction of a second by two electron beams hitting the two tungsten target semi-rings (top and bottom of the detector) that emit x-rays, while at the same time the PET emission photons from inside the patient's body are detected as described in Section VII. The problem of blurring images, or poor spatial resolution associated with imaging moving organs, such as the heart (as well as motion resulting from breathing) is overcome.
The recording of the 511 keV photons of the PET functionality with the timing information allows the software to replay the paths of the biological process at the molecular level in fast or slow motion on the physician's monitor.
The Technological Improvements Which Avoid Saturation Of Electronics, Improve Efficiency Of Current PET, Allow The Extension Of The FOV And Increase Patient Throughput
The improvement in the efficiency of PET and CT is achieved by accurately measuring the properties of most photons that escaped from the patient's body (PET) and that went through the patient's body (CT) and hit the detector. After measuring and validating the “good” ones, a circuit should identify those coming from the same PET event. This requires electronics and algorithms, which are both fast and advanced.
Designers of the electronics of past and current PET, or CT (and designers of the electronics for particle identification in High Energy Physics [ ], [ ]), have approached the goal of the single photon validation requirement by making compromises between (a) a high or low sampling rate, (b) a large or small number of bits of information to handle from each input channel at each sampling clock, (c) thorough (with subdetectors and/or neighboring signal correlation operations) or approximate real-time algorithms, and (d) complex or simple circuits. Within these limitations, conventional thought was that performance improvement would most likely come from a faster processor, FPGA, ASIC, or circuit provided by advances in technology.
Because of the solution described in this article, it is no longer necessary to sacrifice one (high sampling rate) for the other (a good, thorough, real-time, unpartitionable algorithm). This solution does not require the use of faster electronics, but instead, is based on the advantages provided by the 3D-Flow architecture [14, 22] and in its implementation.
The concept of this unique 3D-Flow architecture is shown in Figure 4 and the synchronous data flow through the 3D-Flow system is shown in Table 1. Figure 5 and Figure 6 show the detail of the hardware implementation with short, equal-length Printed Circuit Board (PCB) traces, allowing the use of low-power consumption drivers that solve the problem of ground bounce, noise, cross-talk, and skew between signals. An example of the implementation of the 3D-Flow architecture that clarifies the new concept in simple terms, can be found in Cunningham's statement  (director of the largest Montessori school in the U.S): “in learning the theoretical ideas through the practical activities.”
Design of a system with high throughput and an efficient photon identification, real-time algorithm for a higher sensitivity PET
The problem similar to that of taking millions of pictures per second and recognizing the object in the picture, as introduced in Section II, is described here in a more detailed implementation. A 3D-Flow system samples the detector at 20 MHz (equivalent to taking 20 million pictures per second) and processes the data (1,792 channels with different location IDs as shown in the example of Figure 11, each containing 64 bits of information relative to the energy, DOI, location and timing) every 50 ns (which is equivalent to recognizing the objects in the picture.) The conceptual approach to solve the above problem is the following:
First, one should design a complete, real-time algorithm that extracts the information from various detectors for the best identification of photons. This algorithm may even require the execution of an irreducible number of operations for a time longer than the time interval between two consecutive input data. One example of such an algorithm is the need to correlate information from several subdetectors, or neighboring detector elements. In the event that information from neighboring detectors is needed, each processing element sends the information received from its detector element to the neighboring processors, waits to receive information sent by the neighbors, and then processes the data (to reduce their number), before sending them to the next pipeline stage. Processing elements may need hundreds of nanoseconds (“ns”) to complete processing but they also need to cope with data arriving at the input every tens of ns. The current design based on the well-known pipelined techniques cannot fulfill these requirements because it prevents the use of operations (uninterruptable and lasting hundreds of ns) correlating information from neighboring signals, and this signal-correlation is essential for better photon identification. Additional processing by the photon identification real-time algorithm is described in Section VI.
Second, the design must satisfy the need to execute an unpartitionable algorithm longer than the time interval between two consecutive input data. This is accomplished by duplicating several identical circuits working in parallel and out of phase with respect to the time interval between two consecutive input data. The ratio of execution time to input data period determines the number of circuits required.
Third, these identical circuits must be implemented in a physical architecture for optimal efficiency, with an arrangement designed to provide a uniform time delay of the signal propagation between them, regardless of their number. The design must focus around the concept that no signal of the data flow (3D-Flow bottom to top port) of the programmable hardware will be transmitted a distance longer than that between two adjacent circuits (See Figure 4, Figure 5, and Figure 6).
Fourth, the 3D-Flow architecture must work in a synchronous operation mode with registers in between circuits, as shown in Figure 4, to assure maximum throughput. This is because at each cycle, all signals through the system should travel only through short, equal-distance paths.
Different from the well-known pipelining technique shown in stages a, b, c, e, and f of Figure 4, data to the novel 3D-Flow system architecture shown in the dashed lines of the same figure for stations 1d, 2d, 3d, 4d, and 5d are input at one of the 5 stages d (the one that is free) during every unit of time (for example 50 ns, and each processing unit can process the received data for 250 ns). The merit of the 3D-Flow architecture is provided by the hardware implementation of the connection between the bottom port on one chip and the top port of the adjacent chip with minimal distance between components, as shown in Figure 5 and Figure 6 of the concept described in the dashed lines of Figure 4.
Design verification of the technique providing higher throughput
In order to verify the validity of a design, one can describe the behavior of each unit of the design, and the interrelations between the units, and then have the data flow through them. A detailed simulation from top level to the silicon gate level has been performed as described in [1, , , 22]. The simulation of the concept has also been performed by young students in a “hands on” practice where each student implements the behavior of his unit as described in [ ].
The behavior of each unit (represented in Figure 4, Figure 5, and Figure 7 with a symbol) is the following:
The long rectangle with the dotted arrow inside means “bypass switch.” The behavioral model of the example of Figure 4 can be explained as repeating forever the operations: (a) move (“I/O”) one data packet from input (called “Top port”) to processor while simultaneously moving one result data packet of the previous calculation from processor to output (called “Bottom port”), and (b) move (“bypass”) four data packets in succession from input to output, taking time t1 to move each packet. The bypass switch is not interpreting the content of the message but instead utilizes a preprogrammed functionality counting the number of data packets to send to the processor and the number to bypass. Because the entire system is synchronous, the flow of the input data packets and output data packets result will be as shown in Table 1.
The square is a register (or storage element during one clock cycle) that sends out a data packet and receives a new one when the time-base clock advances one step. The propagation time of this stage is t2.
The rectangle below the switch is the symbol of the process execution task, or function on the input data. Each process on a new set of data during any of stage d is executed from beginning to completion. For the example shown in Figure 4, the execution time is: tP = 4(t1 + t2 + t3) + t1.
The solid right arrow means the delay of the signal on the Printed Circuit Board (PCB) trace connecting the pin of the bottom port of the 3D-Flow processor in one chip to the pin of the top port of the 3D-Flow processor on the adjacent chip. For the example shown in Figure 4, Figure 5, or Figure 6, t3 is the delay provided by the signal on a 3 cm PCB trace. The 3 cm length is due to the example of this application using a 672-pin EBGA component of 27 mm per side. A smaller component will allow a shorter PCB trace.
Implementation merits of the 3D-Flow design
The 3D-Flow system opens new doors to a way of accurately measuring photon properties in real-time by providing the supporting architecture to execute thorough algorithms with zero dead time. The possibility of executing such algorithms in real-time was not envisioned before by the user, because it would have required electronics that were too costly and complex. For some applications with demanding performances, the current approach would not provide a solution at all. For those applications demanding high performance, the 3D-Flow architecture provides a solution because of its simple implementation.
The 3D-Flow implementation allows achievement of high-speed input data throughput at a very low power consumption, which minimizes the problems of ground bounce and cross-talk.
The modularity of the 3D-Flow system permits the implementation of scalable systems, where the complexity of the algorithm or the throughput of the system can be increased.
When an unpartitionable, real-time algorithm needs to execute a longer and more complex task, several programmable, 3D-Flow chips can be cascaded.
One of the key features of the 3D-Flow architecture is the physical design of the PCB board.
During the pin assignment phase of the ASIC design, each pin carrying a 3D-Flow bottom port output is placed adjacent to a pin carrying the input of the relating top port bit.
This allows for uniform trace length when connecting processors of adjacent, cascaded 3D-Flow chips and also allows traces that do not cross each other.
This regular pattern of the PCB traces eliminates cross-talk and signal skew and easily allows impedance matching and a simple low cost PCB construction.
The need to carry unidirectional signals on short PCB traces with equal distance as described above, requires simple, low-power (a few mW) I/O drivers and receivers with a differential signal voltage of a few hundred mV. The driver needs to drive only one load at 3 cm (or less, if the 3D-Flow component is smaller, it will need to drive a PCB trace a few millimeters longer than the side of the component). On the contrary, implementations different from the 3D-Flow architecture attempting to build a system with similar performance, as described in solution No. 3 of Figure 7, will need to make use of a generic I/O driver (e.g., Low Voltage Differential Signaling (LVDS) driver dissipating 35 mW and a LVDS receiver dissipating 15 mW). These generic drivers provided by ASIC manufacturers, designed to drive distances of a few meters, will create problems of high power consumption, ground bouncing, etc., at system level that will be difficult or impossible to overcome. The high power consumption of the generic I/O driver will be too high for the number of I/O ports needed on a Printed Circuit Board (PCB) or in the system. For example, in our case the need to drive 672 bottom-to-port connections at 640 Mbps on the PCB board (see Figure 11 and Figure 12 and in Section V-C.2) consuming 50 mW each, results in a total of 33.6 Watts. This needs to be added to the power dissipation of the other electronics on the board and to that of the North, East, West, and South links going out of the board, which will create serious system problems.
The above implementation merits of the 3D-Flow architecture allow for:
The construction of a very high performance system that can execute n consecutive instructions on a system having an input data rate equal to the fastest implementation of the 3D-Flow processor. Although the latency of the result provided by such a system is longer than the time interval between two consecutive input data, the resulting processing capability of the system on the incoming data is equivalent to that of a processor running n times the speed of the fastest implementation of the 3D-Flow processor (where n is the number of layers of the 3D-Flow system). For example, a 20-layer system with the processor running at 250 MHz provides a system with the resulting processing capability on the incoming data equivalent to that of a 5 GHz processor. The bits on the I/O bus will be transferred from the input of one chip to the input of next chip with a delay of t1 + t2 + t3. The system throughput limitation is calculated as the sum of the time t1 of the bypass switch to commute, (plus) the propagation time t2 of the D register, and (plus) the propagation time t3 of the signal on the 3 cm PCB trace (see Figure 4, Figure 5 and Figure 6). Advanced technologies allow for the implementation of the above functions (t1 + t2 + t3) with a total propagation time of hundreds of picoseconds, providing a throughput of several GHz.
The construction of a low-cost system with a high throughput. The designer selects the technology and processor speed that he/she can afford to build with a given budget. For example, assume that the maximum chip-to-chip speed that one would like to handle is 640 Mbps, the processor speed 80 MHz, and the system throughput with a word of 64 bits at 20 MHz. A 3D-Flow system, with 5 layers, with the above characteristics will provide the capability to execute on each processor a programmable unpartitionable real-time photon identification algorithm of 20 steps (which will include neighbor's data exchange). This will require only two communication channels, each with 32-to-1 multiplexing for the communication between the bottom port of the 3D-Flow processor of one chip and the top port of the 3D-Flow processor on the adjacent chip. All the above parameters are achievable with straightforward implementation of electronics that do not present difficulties of a particular type. For example, the board shown at the bottom right of Figure 11, or top left of Figure 12 (see more details of the 3D-Flow DAQ-DSP board in Section 220.127.116.11 of ) would require one to implement 672 bottom-to-top PCB traces (calculated as 5 cascaded chip-to-chip times 16 processors per chip times 2 lines per port times 4 chips per board, plus 32 traces to the 3D-Flow pyramid chip), 3 cm in length, matched in impedance and carrying signals at 640 Mbps from drivers implemented in the 3D-Flow ASIC with a voltage on a differential signal of a few hundred mV and power consumption of a few mW. Considering that (a) there are PCB developed for telecommunication applications with data speeds at several GHz, on much longer traces than 3 cm, and (b) that the LSI Logic G12 ASIC Cell-Based technology provides up to 33 million usable gates on a single chip (~65,000 gates/mm 2 ) at the power consumption of 22 nW/MHz/Gate (1.8 Volt supply, 0.13 ?m L-effective CMOS technology), the required 1.7 million gates of the 3D-Flow chip with 16 processors is not among the largest chips built, nor is it a relatively “high risk” chip to build.
The architecture of the 3D-Flow system enables it to provide the significant advantages of both high performance and simplified construction at a low cost.
Comparisons between the 3D-Flow system and other techniques.
For better understanding of the advantages of this novel architecture, a comparison is made with other techniques:
The simplest approach to the solution of the execution of a task (see solution No. 1 in Figure 7) is to build a circuit or processor that executes in sequence all necessary operations and does not fetch new input data until the processing of the previous data has been completed.
Another approach which increases efficiency is the well-known pipeline technique used in many applications (e.g., computer architecture) for more than half a century. This technique allows an increase in the throughput by splitting the processing of a task in “n” smaller operations, each executing an nth subdivision of the global task (see solution No. 2 of Figure 7).
When a stage of the pipeline of the previous technique requires the execution of an unpartitionable algorithm longer than the time required by the other stages, the circuit at that stage can be copied and connected by means of a “Generic Switch” to the previous and following stages as shown in solution No. 3 of Figure 7. Because the designer has to lay the components on a PCB, he will face a limit in keeping the distance short. When a signal is going from one component to several components, the path will necessarily be longer for some with respect to others, increasing the signal skew. This will create timing problems. The split from one data point to several data points (“fanout”) should drive more than one unit, requiring high power consumption, which creates spikes, noise, and “ground bounce,” when several outputs switch at the same time. There is no modularity in the implementation, and when the algorithm needs to be increased and more circuits are required, the fanout may not be sufficient, requiring additional buffers for each line. As circuits need to be added, the PCB board territory (PCB real estate) increases with the consequence that the components will be further apart from each other, thus requiring additional circuits in parallel to make up for the lost efficiency in communication speed. Soon the limit of the throughput becomes the power consumption and the distance between components, making this solution undesirable.
The 3D-Flow system solution No. 4 of Figure 7 copies the circuit (or processor) coupled to a bypass switch and a register at the stage where it is necessary to execute an unpartitionable algorithm longer than the time required in the other stages (sequentially-implemented, parallel architecture). This simplifies the construction because it requires short point-to-point connections that need only a very low power driver. The hardware can achieve better performance at a lower cost, because any added circuit will not increase the power consumption on other circuits, require additional drivers or more powerful buffers, or increase the length. The only parameter increase is the latency.
Increasing Sensitivity Improves Resolution, Data Quality And Detection Ability, And Requires Lower Radiation
The previous sections described the architecture that allowed an increase of the throughput in a Data Acquisition system (DAQ) and also described how it could be possible to execute a fast, unpartitionable, thorough, real-time algorithm on each input data packet. Now that we have the supporting architecture, in this section, a short description (with more details in references) will be made of the type of the calculations that are performed in the thorough, unpartitionable, real-time algorithm in order to improve the accuracy, sensitivity and capture more “good” photons. Section VI-D describes (and provides references for more details) how the coincidence detection circuit used in current PET can be simplified, reduced in cost and designed to meet the requirements of zero dead time for the maximum radiation that a detector should ever handle.
The programmability of the 3D-Flow system at each detector channel provides the flexibility to execute any user defined real-time algorithm.
For brevity only a few examples of real-time algorithms that extract the information from the signals received from the detector and accurately measure the properties of the incident photons are described in this article (references to real-time algorithms for particle identification described in more details can be found in Section 13.4.8, 13.4.9, 13.4.10, and 13.4.11 of ). However, the user can execute his real-time algorithm that he had tested off-line on some detector data. One example of such an algorithm is the one tested off-line in some universities on single photon emission data. This algorithm aims to determine the direction of the incident photon of a known energy, when the information of a single scatter + absorption or the information of three scatters are provided. Achieving the result of successfully translating such off-line algorithms into 3D-Flow real-time algorithms would allow one to consider the construction of a SPECT without the need of a lead collimator.
One of the important features added to the design described in this article and in  is the accurate calculation and assignment of a “time-stamp“ to the incident photon (see Section VI-D).
By calculating the differences between the accurate time-stamp of different incident photons, it is possible to isolate data packets belonging to a PET event or to a Compton scatter event. After this separation, the 3D-Flow processing system routes the data packets' information about a specific event to a processing unit for extracting and measuring the particle's properties (e.g., its incoming direction and energy.)
Other examples of operations performed by the 3D-Flow during the execution of the real-time algorithms are the following: (a) measuring the spatial resolution using interpolation, or centroid calculation as described in Section VI-B, (b) calculating the local maxima, which avoids double counting of the photons (see Section 13 of  for more details). (c) measuring the energy resolution as described in Section VI-C, (d) improving the time resolution (see Section VI-D), (e) event integration from slow crystals using digital signal processing techniques (DSP) (see Section VI-A); (f) resolving signal pileup by using DSP techniques when slow crystals are used (see Section VI-A of this document and Section 18.104.22.168 of , and (g) measuring the Depth of Interaction (DOI). DOI measurements solve the problem in identifying the crystal when the incident photon has an oblique penetration (instead of being perpendicular) to the face of the crystal looking toward the emitting source. The effect commonly referred as “parallax error” occurs when DOI is not measured. (See Section 22.214.171.124 and Section 13.4.9 of  for more details).
All the above contribute to increasing the sensitivity of the 3D-CBS scanner, which allows for recording better data quality and increased detection ability, avoids erroneous readings (false positives) and allows the reduction of the radiation delivered to the patient to 1/30 th that of current PET. (See Section 14 and Figure 14-1 of  for a more complete estimate of the loss of PET emission photons at all stages).
The improvement of the electronics in capturing PET emission photons will also result in capturing more CT transmission photons, thus lowering the radiation required during a CT scan. By solving the saturation problem of the electronics of the current PET and being able to process even more photons at low cost, it is possible to increase the FOV dramatically. A detailed description of its implementation is available in Section 13 and 15 of .
Digital Signal Processing at each detector channel
Signals from each detector channel are converted to digital by flash analog-to-digital converters and processed in real-time by programmable 3D-Flow processors. Examples of the sequence of 3D-Flow instructions of a real-time algorithm for photon identification can be found in Section 126.96.36.199 and Section 188.8.131.52 of . A 3D-Flow processor executes the typical arithmetic and logic operations, the multiply accumulate operations and those of moving data from input ports to output ports.
This programmability allows the user to execute on each channel a customized program for every detector, in order to take into account small variations in crystal properties. Some examples of programs that can be executed are the following:
Event integration. When slow crystals are used, DSP techniques are used to digitally integrate the signal. By analyzing the pulse shape of a signal digitally it is possible to extract better spatial, energy resolution and detect with greater accuracy the start of an event and to assign it a precise time-stamp.
Pileup separation. When two events occur in a nearby detector area within a time interval shorter than the decay time of the crystal, the apparent integral of the second signal will show it riding on the tail of the previous signal. DSP techniques can detect the change of slope of the tail of the signal and separate the two signals. This technique can improve existing PET just by changing the electronics without costly hardware detector upgrade.
Normalization. Recording of photons at different energies and correcting them for displaying a good image with the right contrast can be achieved by normalizing the input data through the 3D-Flow look-up tables or through corrections obtained with data processing.
Signal-to-noise ratio improvement. The DSP functionality of the 3D-Flow can execute on each channel standard techniques of signal processing to improve S/N ratio.
Higher accuracy in spatial resolution
Increasing the Field of View also increases the spatial resolution because more pairs of photons in time coincidence can be captured, and those intersecting at 90 allow for better spatial resolution. (See Figure 8).
Spatial resolution is also improved by the centroid calculation algorithm which is now possible because of the exchange of data between neighboring processors without boundary limitation described in the next section and in Section 13.4.8 and Figure 13-12 of .
Higher accuracy in energy resolution
With the 3D-Flow sequentially-implemented, parallel architecture, it is now possible to increase the energy resolution of each incident photon in the detector by more accurately measuring it with the execution of a longer, thorough algorithm (see Figure 9).
Figure 9 shows the difference between the electronics of current PET, which does not extract the particle properties accurately and the technique used in the new 3D-CBS device.
The 3D-Flow system provides the capability to exchange information relative to 2x2, 3x3, 4x4, or 5x5 detector elements in a cost effective manner, after raw data have been fetched from the detector by an array of 3D-Flow processors (see Section 13.4.11 on page 112 of  for details).
In addition, this neighboring information exchange feature allows for many photons to be captured which “Compton scattered” in the crystals (see Section 14.6.3 on page 142 and Section 184.108.40.206 on page 99 of ). These photons are lost by the electronics of the current PET devices because the communication among PMTs is limited to 2x2 elements and photons that are “Compton scattered” in the crystals might spread the energy throughout a larger area.
Higher accuracy in time resolution
Achieving a better time resolution reduces randoms. The capability to assign a time-stamp to each photon detected is achieved by using the DSP technique as described in Section VI-A, or by using the Constant Fraction Discriminator (CFD) at the front-end, which generates a signal edge, which is digitized in time by the Time-to-Digital converter (TDC) with a resolution of 500 ps (Higher time resolution could be achieved, however 500 ps are sufficient for a PET device assisted by Time of Flight information as it is intended to be. This will avoid the need to use expensive fast electronics. Other techniques aiming to determine the location of the interaction by measuring the time-of-flight, require more expensive electronics with a resolution of the order of 50 ps. References of TOF PET can be found in Section 220.127.116.11 of ).
The digitized time information is sent and further improved in resolution by the 3D-Flow DSP (See Section 18.104.22.168 and Section 22.214.171.124 of  for more details).
A very important phase of the process for improving timing accuracy is the calibration that is described in some details in Section 13.4.10 of .
In order to find photons in coincidences, the electronics calculate the time interval between the time-stamp of two photons that hit the detector (see bottom left of Figure 14-5 of ). An accurate time-stamp will allow one to set a maximum time interval between two hits for which the photons will be accepted. This interval will be related to the maximum difference in distance that the two photons traveled before striking the detector.
Thus, if the maximum time interval for accepted coincidence photons is small there is lower probability of recording randoms (or photons belonging to two different events).
Simpler, efficient and lower cost detection circuits
In the new design described in , only the detector elements coupled to a PMT or APD, hit by a photon which was validated by a thorough real-time, front-end pattern recognition algorithm, are checked for coincidence. This method is much simpler than the one used in the current PET, which compares all of the possible LOR (see reference [ ], or Section 13.4.14 of  for more details). The number of comparisons for finding the coincidences is proportional to the radiation activity and not to the number of detector elements as they are in the current PET. The advantage of the new approach requiring simpler electronics is that with only 1.2 x 10 8 comparisons per second, the new approach described in  achieves the efficiency equivalent to that of a current PET that would perform 2.6 x 10 16 comparisons per second (see Section 14.7.2 on page 148 of  and Section 6.3.3 of  for more detail).
In the new design, the coincidence detection problem is solved with simple electronics as described in Section 126.96.36.199 on page 123 of . A simple implementation funnels all hits detected to a single point, sorts the events in the original sequence, as shown in Figure 13-22 of , and compares all hits within a given time interval for validation of time-stamp and location situated on an LOR passing through the patient's body.
Detection, Validation And Separation Of Events From Different Modalities (PET/CT)
Reference [ ] describes a detector module for multimodal PET/CT made of a multi-crystal detector CsI(TI)/LSO/GSO coupled to APD, capable of discriminating low-energy X-rays (60 keV), medium-energy (120 keV used for CT of overweight patients) and 511 keV -rays used with PET.
In Figure 10a, the authors  propose a thin (3 mm) CsI(TI) scintillator sitting on top of a deep GSO/LSO pair read out by an avalanche photodiode (APD). A channel consists of all signals from all detectors coupled to sensors (APD, photomultiplers, photodiodes, etc.) within a given view angle of the detector seen from the radioisotope source located in the patient's body. In this application a channel is 64-bit. See also reference [ ], and Table 1, Figure 4, Figure 11 and Section 188.8.131.52 of .
The article  also reports additional tests made on another phoswich detector that consists of YSO/LSO coupled to APD as shown in Figure 10b
The GSO/LSO pair provides depth of interaction (DOI) information for the 511 keV detection in PET. Measurements (see reference ) show that CsI(TI) achieves the best energy resolution and largest time separation at all energies (60 keV, 140 keV, and 511 keV) and should have a thickness such that all x-rays will be absorbed in CT mode.
The medium -rays of 120 keV (measurements were made by the authors of  at 140 keV) will interact in the two front layers of the detector (CsI(TI) and LSO) and are not expected to reach the bottom GSO layer.
The measurements reported in  can be easily implemented in the real-time algorithm executed by each 3D-Flow processor (see Section V). First, the energy of the photons are validated by summing and comparing with the neighbors and then the CT photons are separated from the PET photons as described in detail in Section 13 of .
Increasing The Input Bandwidth To 3.5 X 10 Events Per Second To Avoid Saturation
The input bandwidth of the system can be very high, yet avoiding saturation for any radiation activity for CT photon detection and for PET photon detection. With the 3D-Flow system, the saturation will be determined by the detector decay time (which is limited to a small area of the crystal because they are cut in 2x2 mm or 4x4 mm) and not by the electronics. The novel design offers a zero dead-time input bandwidth of 3.5 x 10 10 single photons per second when a system made of 1,792 channels is used (calculated as 1,792 channels x 20 x 10 6 events per second at each channel. See also Section 13.4.2 on page 88 and its implementation in Section 17.1.2 on page 176 of  for more detail). With a sampling rate of 20 MHz, the system designed in  can fetch data at the same channels with a time interval of only 50 ns (which approximately is the decay time of the fastest crystals such as LSO).
Selecting The Output Bandwidth As A Function Of The Highest Radiation Dose Delivered To The Patient For Capturing More Photons In Time Coincidence
The output bandwidth of the coincidence detection circuit of this design is related to the radiation activity. For example, for a radiation dosage delivered to the patient of 2 mCi, the new design (see Figure 14-1 in ) with a FOV of over 137.4 cm, it is estimated that no more than 5 x 10 6 coincidences per second will hit the detector. Designing a coincidence circuit as described in Section 13.14.4 of  that can perform about 10 8 comparisons per second, regardless of the number of detector elements used is sufficient to fulfil the needs for that selected maximum radiation activity estimated at the detector. Conversely, the approach of detecting coincidences with less than 100% efficiency used in current PET , with only 15 cm FOV and about 56 modules, already requires several circuits performing about 3 x 10 9 comparisons per second (See reference  and Section 13.4.14, and Section 14.7.2 of ). The current PET requires the high number of comparisons and the complex circuits because the approach is related to the number of detectors and not to the radiation activity.
Modular Iplementation for IBM PC or VME platform for systems of any size
The modularity, flexibility, programmability and scalability of the 3D-Flow system for the electronics of 3D-CBS applies to all phases of the system, from the components to the IBM PC chassis, (or crate(s) for the VME implementation).
The same hardware can be used to replace the electronics of current PET, as well as building new systems of different sizes, making use of different detectors that provide analog and digital signals. The programmability of the 3D-Flow system can acquire, move, correlate, and process the signals to best extract the information of the incident photons and find the coincidences.
Two examples of implementation are:
One, based on the IBM PC platform, (See Figure 12) has the advantage of providing the latest and most powerful CPUs and peripherals at the lowest price because of the large volume of its market. However, it has the disadvantage that particular care must be taken in the connectors and cables carrying the information between processors located on different boards. (See also details on IBM PC board layout and power consumption estimate in  Section 15.1.1 for 3D-Flow DAQ-DSP boards, Section 15.2.1 for the pyramidal and buffer 3D-Flow PY-BUF-DSP board, and Figure 15-7 for board-to-board connections through ribbon cables carrying the information to/from neighboring 3D-Flow processors).
The other, based on VME, has the advantage of a robust and reliable construction with the signals between processors on different boards carried through a secure backplane. However, the market for the latter is smaller, the prices are higher, and the boards with the latest components take more time to produce. (See details on VME board layout and power consumption estimate in  Sections 15.1.2, a 15.1.3 for 3D-Flow DAQ-DSP boards, Section 15.2.2, for the pyramidal and buffer 3D-Flow PY-BUF-DSP board, and Figure 15-7 for board-to-board connections through backplane carrying the information to/from neighboring 3D-Flow processors).
Selecting the number of crystals to couple to each PMT
Detectors of PET/CT devices of different sizes and of different components (crystals coupled to PMT or APD, photodiodes coupled to crystals, solid state detectors, etc.) can be mapped to the 3D-Flow system.
The ratio of 256 crystals (or a single crystal of equivalent size in a “continuous” detector type described in ) coupled to a photomultiplier of 38 mm in diameter has been selected:
Based on the promising results obtained by the tests performed by Andreaco and Rogers [ ] in decoding 256 BGO crystals per block without having reached a limit in the number of crystals that could be decoded.
Based on the number of photomultipliers per detector area used in several PET built by Karp and co-workers on the “continuous” detectors (e.g., 180 PMTs were used in the HEAD PENN-PET with the ring of 42 cm in diameter and 25.6 cm FOV. See [ ], and Table 1-3 and Sec. 184.108.40.206 of ). Each of the 1,792 PMT of the new PET design with 3D-Flow is coupled to an equivalent detector area.
In the event the light emitted by a certain type of crystal adopted in a particular PET design is not sufficient, or the S/N ratio does not allow the decoding of 256 crystals, then the number of PMT and electronic channels can be multiplied by four and the 256 channels 3D-Flow DAQ-DSP board (see Section 220.127.116.11 of ) can be used in place of the 64 channels.
Selecting the number of detector channels to be handled by 3D-Flow processors
For each platform, IBM PC, or VME, two systems have been designed (see Section 17 of  for details). For applications requiring less processing (e.g., using fast crystals such as LSO), a system with 4 channels for each 3D-Flow processor is described. For applications requiring higher computational need, such as when detectors with economical crystals (e.g., BGO) with a slower decay time are used, a system with one channel per processor is described.
Using slow crystals or fast crystals
When crystals with a slower decay time are used, a longer real-time algorithm performing digital signal processing to extract and measure particle properties from the signal is needed. Crystals with faster decay times require only one sampling for each photon detected before correlating it with signals from other subdetectors or neighboring detectors. (See Section 13.4.3 of  for detail).
Using The 3D-Flow Simulator To Evaluate The Efficiency Of Different Real-Time Algorithms In Extracting And Measuring Particle Properties From Different Detectors
The overall architecture of the front-end electronics for the PET electronics of this design is based on a single type of circuit, the 3D-Flow processing element  consisting of fewer than 100K gates. It is technology independent and is replicated several times in a chip, in a board, and in a chassis (or crate). The unpartitionable real-time algorithm such as the ones described in Section 18.104.22.168 of  for extracting and measuring particles properties can be simulated with the 3D-Flow tools. A specific real-time program for each different PET device can be downloaded into the 3D-Flow system to tune the photon identification and coincidence detection algorithm to a specific detector. Algorithm optimization and tradeoffs between performances and costs (that provides respectively efficiency versus algorithm complexity), can be simulated before construction of the system. The simulator can input pattern from files of PET events recorded during examination or files of events generated from Monte Carlo simulation. The real-time algorithm executed on the 3D-Flow architecture provides the results of the photons identified.
The 3D-Flow Design Real-Time tools:
Create a new 3D-Flow application (called project) by varying system size (number of sensors in the PET), throughput, filtering algorithm, and routing network (connections between 3D-Flow chips), and by selecting the processor speed, lookup tables, number of input and output bits for each set of data packet received for each algorithm execution;
Simulate a specified parallel-processing system for a given algorithm on different sets of data. The flow of the data can be easily monitored and traced in any single processor of the system and in any stage of the process;
Monitor a 3D-Flow system in real-time via the RS232 interface, whether the system at the other end of the RS232 cable is real or virtual; and
Create a 3D-Flow chip accommodating several 3D-Flow processors by means of interfacing to the Electronic Design Automation (EDA) tools.
A flow diagram guides the user through the above four phases. A system summary displays the information for a 3D-Flow system created by the Design Real-Time tools.
The basic 3D-Flow component shown in Figure 5 has been implemented in a technology-independent form and synthesized in 0.5 ?m, 0.35 ?m technology, and in FPGA's Xilinx, Altera and ORCA (Lucent Technologies). The most cost-effective solution is to build the 3D-Flow in 0.18 ?m CMOS technology @ 1.8 Volts, accommodating 16 3D-Flow processors with a die size of approximately 25 mm 2, and a power dissipation [gate/MHz] of 22 nW. Each 3D-Flow processor has approximately 100K gates, giving a total of approximately 1.7 million gates per chip.
Example Of The Use Of This Design Set Forth For The Construction Of A PET/CT With A FOV Of 137.4 Cm
Figure 11 (bottom left) shows the logical and physical layout of the complete PET system consisting of an elliptical gantry 102 cm long (FOV) for the torso and a circular gantry 35.4 cm long (FOV) for the head. Any shape of the gantry, from the simplest circular gantry for head and torso to the elliptical or to the shape closest to the human body can be used. Tradeoffs should be made between (a) cost savings in crystal and the Time of Flight gained by the photon reaching the detector and (b) the higher complexity in the image reconstruction software (when shapes other then circular designs are used). The right-hand side of the figure shows the detail of each block of crystals coupled to a phototube 38 mm in diameter. Several layers of crystals with different decay times are required to measure the depth-of-interaction (DOI).
The signal of the phototube is sent to the first layer of a 3D-Flow system [ ]. A total of 1,792 phototubes are required (256 for the head and 1,536 for the torso). The top-right of the figure shows the relationship between one phototube to the first layer of the 3D-Flow system (six layers, plus one 3D-Flow chip for the pyramid per each board. See at the bottom right of the figure how the chips shown on the top right of the figure are positioned on the PCB board, one next to the other with the arrow as shown also in detail in Figure 5 and Figure 6). Sixteen processing elements (PE) are accommodated into a chip, with four 3D-Flow chips per layer. There are six layers on a 3D-Flow DAQ-DSP board (See the relation between the detectors, photomultipliers, signals, and the 28 3D-Flow DAQ-DSP boards in the center left of the figure).
The functionality of the 3D-Flow DAQ-DSP board is described in Section 13 and in Section 15.1.1. of . The bottom right of Figure 11 shows the estimated component layout of the 3D-Flow DAQ-DSP board implemented in IBM PC platform for all PET/CT functions. Each 64-channel 3D-Flow DAQ-DSP board (IBM PC or VME version) will consist of 32 ADC AD9281 (or equivalent, 28-pin SSOP); 2 x 32-channel preamplifiers (256-pin FineLine BGA); 2 x 32-channel TDC (225-pin BGA MO-151); 25 x 3D-Flow (672 FineLine BGA); 3 x FPGA (484-pin FineLine BGA). A SO-DIMM memory (28 mm x 67 mm) and an FPGA (484-pin FineLine BGA) on the back of the board can be installed if a larger buffer of CT events for CT scan is desired
The connections carrying the information between neighboring 3D-Flow processors are described in detail in Section 15.3 and shown in Figure 15-7 of .
Figure 12 shows the layout of the hardware assembly for a 137.4 cm FOV PET/CT. The complete electronic system consisting of 28 DAQ-DSP boards, two CPU with Pentium (or similar processor), and one pyramidal, coincidence detection board with a patch-panel for 28 connectors as shown in Figure 12 (the pyramidal/coincidence board could also be accommodated in the operator console) is accommodated in two IBM PC chassis (e.g., CyberResearch 5U , motherboard PBPW 19P18 with 18 PCI+1 slot for CPU. 14 slots will be used by the 3D-Flow DAQ-DSP boards leaving 4 free slots) with power supplies up to 800 Watts for each chassis.
The center of Figure 12 shows the estimated component layout of the pyramidal board implemented for the IBM PC platform for all PET/CT functions. The pyramidal board receives the data relative to the photons validated by the real-time algorithm executed on the 3D-Flow DAQ-DSP boards through a patch panel shown in Figure 12. It then, performs the functionality described in  of the attenuation correction described in Section 13.4.2, separating the photons found into the two modalities (PET and CT), the channel reduction in Section 13.4.12, and the coincidence identification in Section 13.4.14. The board stores results, the coincidences found (or the single photon validated by the algorithm for CT when the buffer memories on the DAQ-DSP boards are not installed), in the two DIMM buffer memories, which can have a capacity of up to 4 GB each for a total maximum of 1 billion events accumulated (each event or coincidence is in the 64-bit format described in Section 22.214.171.124 of ), during a single study session. (The buffer memory can be increased if needed). An additional DIMM module memory of 512 MB stores the coefficients for the attenuation correction acquired by the CT during a calibration scan as described in Sec. 13.4.6 of .
Results are read from the buffer memories by the IBM PC CPU via the PCI bus (or VME CPU via the VME bus), formatted in any protocol before being sent to the graphic workstation via a standard high-speed local area network.
How The Examination Cost Is Lowered Compared To The Current PET
Model for keeping the examination cost to the patient low and requiring a minimum investment in the machine, making the new 3D-CBS accessible to all hospitals
The cost of a 3D-CBS examination is lower and is estimated to be about $300 to $400 (compared to a cost of $2,000 to $4,000 for an examination using the current PET) because the radiation dose delivered to the patient is only 1/30 th of the dosage required in current PET (10 mCi of FDG tracer costs about $600, while 0.33 mCi of FDG is estimated to cost between $40 and $60), and the number of patients scanned per day is about 40 to 60 (assuming a 10-12 hour day) , compared to the 6 to 7 scanned by the current PET. The scanning time is reduced to 3 to 4 minutes for 137.4 cm (because the bed is in a stationary position and the scan is done in a single position), compared to the current PET requiring about 55 minutes for 70 cm of scanning. (That is based on five bed positions of 10 minutes each with a 14 cm FOV PET. See more details on page iii of ).
Large hospitals can purchase the 3D-CBS unit. To make the service accessible to all hospitals, including the small ones with limited capital for investment in expensive medical instrumentation, the PET manufacturers (or other investors such as insurance companies) may sell the PET service by the day, month, or year, as is already done by mobile PET companies. Using this model, the hospital would be charged only by the day (e.g., one day per week), at, for example, $10,000 for the current PET with a short FOV (based on the PET with a retail cost of about $2.5 million), and $15,000 for the new 3D-CBS with a longer FOV (based on a 3D-CBS which includes PET and CT with a retail cost of about $6 million). The current PET examines about 6 patients per day at a cost of about $2,100 each. (The charge of an examination can go from $1,790 to $4,200. Some HMO plans in the U.S. reimburse the hospital as much as $4,200 per scan; the medicare reimbursement rate for a PET scan is $1,790). The greater number of patients examined per day at a lower cost of the FDG tracer should lower the examination cost to about $300 to $400 each. The goal price of $200 per examination could be reached when more 3D-CBS devices are available generating price competition and more people choose a faster and more sensitive cancer screening.
The current market for the 3D-CBS is the current CT and PET market plus the new preventive-medicine market for screening that was not envisioned before the development of the 3D-CBS. In 1990 there were about 45,000 CT scanners in the U.S and they performed about 13 million examinations. In 1980, about 3,300,00 CT examinations were performed. The PET market is very small due to the high cost of the examination, high radiation dose required to be delivered to the patient with current PET and the small number of units available today. (There are approximately 200 PET units in the US today and during the next year GE and Siemens plan to deliver 100 new units in total).
Because of the new possibility, afforded by this design, of using these examinations as an annual screening technique for asymptomatic people (due to the low radiation dose delivered to the patient well within the acceptable limits of radiation exposure, and the lower cost of the examination of the 3D-CBS), the market size will increase every year as the public becomes aware of the availability of this screening. This will provide a benefit for the entire population by lowering the cost of health care.
How can a device that extends the FOV by about 7 times and increases the efficiency over 400 times cost only 2 to 3 times the current PET?
The book  details the designs that allow for a PET with about 7 times the field of view but without 7 times the cost. (See Section 18 on page 184 and Figure 1-5 on page 10 of ). The innovations, such as the elliptical crystal detector arrangement, maximize utilization of the detectors while minimizing the cost; e.g., the photomultipliers number less than twice as many, rather than seven times as many. The principal advantage derives from the electronics: it costs less and has higher performance due to its unique conceptual approach. The costs of the new PET design and the component parts are based on the actual cost of the current manufacturers of PET and manufacturers of isotopes, photomultipliers, and crystal detectors.
The cost of the main components of a 3D-CBS, assuming the cost of the crystals being about $10/cm 3 , is: about $500,000 for the crystals (calculated for a 25-mm
thick, small ring for the head, and elliptical form for the torso); about $350,000 for the phototubes (assuming the cost of the 1 1/2” PMT to be $200 each, 1,792 PMTs will cost about $350,000); and the electronics is estimated to cost about $200,000 (calculated as 28 x 3D-Flow DAQ-DSP boards with 64 channels each, costing $5,000 each, plus $60,000 for two IBM PC CPU, two IBM PC chassis, one 3D-Flow pyramidal board, hard drives, ancillary logic, and cables. See Figure 11 of this article and Section 17.2 on page 181 of  for details), for a total of about $1 million. An equivalent pricing of the main components applied to the current PET available on the market requires one to multiply this number by five to include assembly and other parts in order to obtain the estimated retail price of $5 million. (Using faster crystals will increase the cost; however, because the same detector can be used by both modalities, PET and CT, this increase will apply only one time).
For comparison, the following calculation can be made on the largest PET commercially available: The volume of the crystals of a CTI/Siemens 966/EXACT3D is about 13,602 cm 3 . Assuming the cost of BGO to be $10/cm 3 , the cost of the crystals is $136,020. Assuming the cost of 3/4” PMT to be $160 each, 1,728 PMTs cost $276,480. Estimating the cost of the electronics to be $100,000, the total cost of the main materials of a 966/EXACT3D is about $512,500. When all other components such as assembly, software, marketing, etc. are included, the price must be multiplied by 5 to arrive at about $2.5 million for the retail price.
How much does the radioisotope cost
The limiting factor of the widespread adoption of PET is the low efficiency of the existing electronics that requires the delivery of a high radiation dosage to the patient coupled with the inability to examine more than 6 to 7 patients per day. Besides PET cancer study, several other studies of the heart or checks on epilepsy could be taken with greater frequency for diagnostic purposes if the efficiency of the PET would be higher and did not require the delivery of a high radiation dose to the patient. The 3D-CBS makes this needed examination cost-effective, and with its lower level of radiation required, makes it safe for the patients.
The low number of examinations per day made on current PET has the effect of increasing the examination cost due to the almost fixed cost of the production of the radioisotope.
The radioisotope with a very short half-life such as 15 O-water (half-life of 124 sec), 13 N-ammonia (half-life of 600 sec) and 11 C-methionine (half-life of 1,218 sec), requires a cyclotron nearby. The cost of a cyclotron is about $2 million and there is not much cost increase in generating 40 doses of low radiation units versus 6 doses of high radiation units.
When the 18 F-FDG (half-life 110 minutes) tracer is used, it can be produced in one center with a cyclotron and distributed to several satellite PET within a few hours from the center. For example, for a 10 mCi dose at the satellite PET seven hours after production by the radiopharmaceutical laboratory, 160 mCi is required rather than just 10 mCi.
For cardiac studies using 82 Rb (75 sec half-life) the production of the radioisotope is made by the purchase of an infusion pump unit ($75,000 one-time cost) and the purchase of the generator ($25,000) every month. This generator allows for production of one unit dose of 50 mCi every 10 minutes for 28 days.
Currently, as well as in the past the total cost of PET + cyclotron can provide an examination at an affordable cost with their cost being amortized in a short time as described in the previous section, if a high efficiency PET such as the one set forth in this article and in  that would provide a high patient examination throughput would become available. (The improvement of the efficiency should be provided mainly from the electronics because excellent detectors have been available for many years).
What is the additional cost of the CT section of the scanner?
The additional cost of the CT section (see Figure 3) includes only the cost of the electron beam generator, the focus and deflection coils, the tungsten target rings, and the vacuum pump system. The other components such as the detectors, photomultipliers and the electronics are the same as the ones used for PET.
For the additional components for the CT scanner, the cost has been generously estimated $1 million. The CT + PET will make a 3D-CBS device with a cost of about $6 million.
A new unique implementation of the bottom-to-top port connection between different processors of different 3D-Flow chips has been designed as shown in Figure 5 and described in Section V-C. Its implications in simplifying the hardware and enabling an increase in the efficiency of the electronics for fast real-time DAQ and processing systems are significant.
An application of such a technique in a 3-D Complete Body Scan (3D-CBS) is described in this article.
Other areas of the 3D-CBS application benefiting from this design are: hardware, software, cabling, system architecture, component architecture, detector element layout, interface between electronics and detector elements, data acquisition and processing, and detection of coincidences.
This implementation of the bottom-to-top port connection and the new 3D-Flow architecture achieves the following:
It solves the problems of PCB board power dissipation, ground bouncing, noise, signal cross-talk, skew, etc. created by using generic I/O drivers with the capability of driving meters in distance, but which consume 50 mW, for a total of 33.6 Watts on a single board, when used on 672 connections. The problem is solved by the use of a low-power driver/receiver (a few mW) with a low differential voltage signal of a few hundred mV, designed to produce the minimal amount of power needed to send a signal on short, equidistant PCB traces, with matched impedance.
It allows the building of a high-throughput, 3D-Flow programmable, real-time system with several processing units, each one processing entirely a thorough unpartitionable particle identification algorithm with an execution time longer than the time interval between two consecutive input data. (The algorithm extracts and accurately measures photon properties such as energy, spatial, and timings with high resolution; measures the DOI to eliminate the parallax error; correlates data from different subdetectors and/or from neighboring detectors for more accurate measurements; and improves signal-to-noise ratio and resolves signals pileup with DSP processing techniques.).
It eliminates dead-time and system saturation by providing an input bandwidth of up to 35 billion events per second when sampling the detector at 20 MHz on 1,792 channels.
It finds more photons in coincidence using less circuitry because the implementation approach is related to the radiation activity rather than to the number of detectors, as in the current PET.
The design of the 3D-CBS system which incorporates a PET and a CT in a single unit with extended FOV of 137.4 cm consists of ~50,000 cm 3 of crystals, 1,792 PMT (square or round 1 1/2 “), electron beam gun, vacuum system, ~52 tungsten half-rings, focus and deflection coils for the electron beam, whole-body gantry, two IBM PC chassis (or two VEM crates) each containing 14 x 64-channels 3D-Flow DAQ-DSP boards and a Pentium (or equivalent) CPU boards. A single 3D-Flow pyramidal board is installed in the IBM PC console collecting the results of the “good” photons found from the 28 x 3D-Flow DAQ-DSP boards through a patch panel (see Figure 3, Figure 11, and Figure 12).
The article and the referenced book  detail blueprints of an entire 3D-CBS system (detector layout, the integration of the CT electron beam in the combined multi-modal device, electronics layout, etc.) that does not require the use of septa rings between detector rings, that fully extracts particle properties, that does not saturate, and that completely eliminates motion artifacts. This increases the sensitivity for both PET and CT modalities acquired at the same time with very significant radiation reduction to the patient (1/30 th ).
The improvements of the 3D-CBS fully exploit for the first time the far superior ability of the PET technology to emit photons in diametrically opposed directions. Combining PET/CT in one device further assists physicians in clinical examinations; viewing PET functional imaging data in conjunction with CT morphologic cross-section is sometimes mandatory if lesions are found.
Comparing the new design set forth in this article, with its improved sensitivity and elimination of motion artifacts, to existing non-invasive imaging devices, advantages in addition to lower examination cost will be seen to be higher resolution, no image blurring, higher data quality, and better detection ability, which will help to avoid erroneous readings (false positives). Only a good quality image from a PET with high sensitivity and without motion artifacts, interpreted by an experienced radiologist, can indicate whether and to what extent certain diseases are present in the body.
The design described in this article allows one to improve the efficiency of current PET with short FOVs by capturing more photons at lower cost, and allows one to dramatically increase the FOV for the construction of a more cost-effective PET/CT.
Before the present design was devised, it was not economically advantageous to build a PET/CT with increased FOVs, because the benefit of increased sensitivity and decreasing examination time was not thought to offset the cost increase. However, cost effectiveness can be achieved by broadening the market to include rental of the equipment to smaller hospitals and by recognizing the economy of performing ten times as many examinations as current PET can do for the same FOV at a cost only 2 to 3 times greater.
A study conducted by the National Cancer Institute determined that cancer costs $107 billion per year in the U.S; $37 billion for direct medical costs, $11 billion for morbidity costs (cost of loss of productivity), and $59 billion for mortality costs. Early detection, in addition to providing a better quality life for people, will allow the patient to avoid expensive procedures typical when the cancer is found in its advanced development or in metastasis in the body. A practical, affordable device offering early detection would offer savings in the big picture as well. Thus the introduction of a 3D-CBS into the market is desirable from many different aspects, including that of cost reduction.